We recently proposed DOVER-Lap, a method for combining overlap-aware speaker diarization system outputs. DOVER-Lap improved upon its predecessor DOVER by using a label mapping method based on globally-informed greedy search. In this paper, we analyze this label mapping in the framework of a maximum orthogonal graph partitioning problem, and present three inferences. First, we show that DOVER-Lap label mapping is exponential in the input size, which poses a challenge when combining a large number of hypotheses. We then revisit the DOVER label mapping algorithm and propose a modification which performs similar to DOVER-Lap while being computationally tractable. We also derive an approximation bound for the algorithm in terms of the maximum number of hypotheses speakers. Finally, we describe a randomized local search algorithm which provides a near-optimal $(1-\epsilon)$-approximate solution to the problem with high probability. We empirically demonstrate the effectiveness of our methods on the AMI meeting corpus. Our code is publicly available: https://github.com/desh2608/dover-lap.
翻译:我们最近提议了DOVER-Lap, 这是一种将重叠显要语言分解系统输出合并起来的方法。 DOVER-Lap 使用基于全球知情的贪婪搜索的标签映射方法改进了其前身 DOVER-Lap 。 在本文中,我们在最大正方形图形分割问题的框架内分析这一标签映射,并提出了三种推论。 首先,我们显示DOVER-Lap 标签映射在输入大小上是指数指数指数指数指数指数的指数映射,这在结合大量假设时构成了挑战。 然后我们重新审视DOVER 标签映射算法, 并提出一个与 DOVER-Lap 相似的修改, 并在可计算时进行修改。 我们还从假设发言者的最大数量上得出了接近值。 最后, 我们描述了一个随机本地搜索算法, 它提供了近乎最佳的 $( 1- epslon) $- pappobal 的解决方案。 我们从经验上展示了我们在AMI会议体上的方法的有效性。 我们的代码可以公开查阅: https://github.com/desh/desh2608/dover-plat-lab.