While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems. In this paper, we demonstrate that a multi-stage clustering strategy that uses different clustering algorithms for input of different lengths can address multi-faceted challenges of on-device speaker diarization applications. Specifically, a fallback clusterer is used to handle short-form inputs; a main clusterer is used to handle medium-length inputs; and a pre-clusterer is used to compress long-form inputs before they are processed by the main clusterer. Both the main clusterer and the pre-clusterer can be configured with an upper bound of the computational complexity to adapt to devices with different resource constraints. This multi-stage clustering strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.
翻译:近期说话人分离方面的研究主要关注于提高分离效果,但对于提高分离系统的效率也越来越感兴趣。本文展示多阶段聚类策略,使用不同聚类算法处理不同长度的输入,以应对设备上的多方面挑战。具体而言,回退聚类器用于处理短型输入;主聚类器用于处理中型输入;预聚类器用于压缩长型输入,以便它们被主聚类器处理。主聚类器和预聚类器均可配置计算复杂度上限,以适配资源限制不同的设备。多阶段聚类策略对于流媒体上设备的说话人分离系统至关重要,其中 CPU、内存和电池的预算都很紧张。