This paper describes a method for overlap-aware speaker diarization. Given an overlap detector and a speaker embedding extractor, our method performs spectral clustering of segments informed by the output of the overlap detector. This is achieved by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition. Thereafter, we discretize the solution by alternatively using singular value decomposition and a modified version of non-maximal suppression which is constrained by the output of the overlap detector. Furthermore, we detail an HMM-DNN based overlap detector which performs frame-level classification and enforces duration constraints through HMM state transitions. Our method achieves a test diarization error rate (DER) of 24.0% on the mixed-headset setting of the AMI meeting corpus, which is a relative improvement of 15.2% over a strong agglomerative hierarchical clustering baseline, and compares favorably with other overlap-aware diarization methods. Further analysis on the LibriCSS data demonstrates the effectiveness of the proposed method in high overlap conditions.
翻译:本文描述一种有重叠觉察器和扩音器嵌入提取器的方法。 在有重叠检测器和扩音器的情况下, 我们的方法对通过重叠检测器输出的分部分进行光谱集束。 这是通过将离散聚积问题转换成通过eigen分解解决的convex优化问题来实现的。 之后, 我们通过使用单值分解和修改版的非最大抑制方法将解决方案分离出来, 后者受重叠检测器输出的制约。 此外, 我们详细介绍了基于 HMM-DNN的基于 HM-DNN 的重叠探测器, 该探测器进行框架级分类,并通过 HMM 状态过渡强制实施时间限制。 我们的方法在AMI 群集的混合头饰设置上实现了24.0%的测试分解误差率(DER), 相对于强大的集聚群群群群群群集基线相对而言, 相对优于15.2%的优于其他重叠分解方法。 对LibriCSS数据的进一步分析显示拟议方法在高重叠条件下的有效性。