论文标题
用辅助自动编码损失分开不同数量的来源
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss
论文作者
论文摘要
许多最近的源分离系统旨在将固定数量的源与混合物分开。在源激活模式未知的情况下,此类系统必须调整输出数量或从有效的输出中识别无效的输出。迭代分离方法在社区中引起了很多关注,因为它们可以灵活地决定输出数量,但是(1)他们通常依靠长期信息来确定迭代的停止时间,这使得它们在因果环境中很难操作; (2)当估计的源数与实际数量不同时,它们缺乏“容错”机制。在本文中,我们提出了一种简单的培训方法,即辅助自动编码置换不变培训(A2PIT),以减轻这两个问题。 A2PIT假定固定数量的输出,并使用辅助自动编码损失来迫使无效输出成为输入混合物的副本,并在推理阶段以完全不受监督的方式检测无效的输出。实验结果表明,A2PIT能够改善各种扬声器的分离性能,并有效地检测混合物中的扬声器数量。
Many recent source separation systems are designed to separate a fixed number of sources out of a mixture. In the cases where the source activation patterns are unknown, such systems have to either adjust the number of outputs or to identify invalid outputs from the valid ones. Iterative separation methods have gain much attention in the community as they can flexibly decide the number of outputs, however (1) they typically rely on long-term information to determine the stopping time for the iterations, which makes them hard to operate in a causal setting; (2) they lack a "fault tolerance" mechanism when the estimated number of sources is different from the actual number. In this paper, we propose a simple training method, the auxiliary autoencoding permutation invariant training (A2PIT), to alleviate the two issues. A2PIT assumes a fixed number of outputs and uses auxiliary autoencoding loss to force the invalid outputs to be the copies of the input mixture, and detects invalid outputs in a fully unsupervised way during inference phase. Experiment results show that A2PIT is able to improve the separation performance across various numbers of speakers and effectively detect the number of speakers in a mixture.