Podcasts are conversational in nature and speaker changes are frequent -- requiring speaker diarization for content understanding. We propose an unsupervised technique for speaker diarization without relying on language-specific components. The algorithm is overlap-aware and does not require information about the number of speakers. Our approach shows 79% improvement on purity scores (34% on F-score) against the Google Cloud Platform solution on podcast data.
翻译:播客在性质上是对话性的,发言者变化频繁 -- -- 需要演讲者对内容理解的分化。我们建议一种不受监督的语员对分化技术,不需要依赖语言特定组件。算法是重叠的,不需要关于发言者人数的信息。我们的方法显示,相对于播客数据的谷歌云平台解决方案而言,纯分(F-score为34%)提高了79%。