Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speaker embedding models, loss functions, and scoring back-ends, with highly competitive results achieved by structured recipes which were adopted in the winning systems in several speaker verification challenges. The application to other downstream tasks such as speaker diarization is also exhibited in the related recipe. Moreover, CPU- and GPU-compatible deployment codes are integrated for production-oriented development. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.
翻译:发言者建模对于许多相关任务至关重要,如发言者的识别和发言者的分化等。主要模型方法为固定维度矢量代表,即嵌入发言者。本文介绍了研究和生产导向型发言者嵌入学习工具包,Wespeaker。Wespeker载有可扩缩的数据管理、最先进的发言者嵌入模型、损失功能和评分后端的实施,在几个发言者的核实挑战中,胜出系统中采用的结构化配方取得了高度竞争性的成果。在相关配方中还展示了对诸如发言者的分化等其他下游任务的应用。此外,CPU和GPU-兼容的部署代码被整合到面向生产的发展中。该工具包可在https://github.com/wenet-e2e/wespeker上公开查阅。