Permutation invariant training (PIT) is a widely used training criterion for neural network-based source separation, used for both utterance-level separation with utterance-level PIT (uPIT) and separation of long recordings with the recently proposed Graph-PIT. When implemented naively, both suffer from an exponential complexity in the number of utterances to separate, rendering them unusable for large numbers of speakers or long realistic recordings. We present a decomposition of the PIT criterion into the computation of a matrix and a strictly monotonously increasing function so that the permutation or assignment problem can be solved efficiently with several search algorithms. The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.
翻译:变异性培训(变异性培训)是广泛使用的神经网络源分离培训标准,既用于与语音级PIT(uPIT)的谈话分解,又用于与最近提议的图形-PIT(图形-PIT)的长记录分解。在天真的实施时,两者都面临要分开的发声数量的指数复杂性,使得大量发言者或长长的现实记录无法使用。我们在计算矩阵和严格的单质增加功能时对PIT标准进行了分解,以便通过几种搜索算法有效地解决变异性或分配问题。匈牙利算法可用于虚拟-PIT(UPIT),我们为图形-PIT(图形-PIT)分配问题引入了各种算法,以降低发言数量的多重复杂性。