ICASSP 2022年多渠道多党会议记录(M2MET)挑战的USTC-Ximalaya系统 (The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge)

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition. First, for data preparation and augmentation in training TS-VAD models, speech data containing both real meetings and simulated indoor conversations are used. Second, in refining results obtained after TS-VAD based decoding, we perform a series of post-processing steps to improve the VAD results needed to reduce diarization error rates (DERs). Tested on the ALIMEETING corpus, the newly released Mandarin meeting dataset used in M2MeT, we demonstrate that our proposed system can decrease the DER by up to 66.55/60.59% relatively when compared with classical clustering based diarization on the Eval/Test set.

翻译：我们建议对目标发言人语音活动探测(TS-VAD)进行两项改进,这是向2022年多渠道多党会议分解(M2MET)挑战提交的拟议发言者分解系统的核心组成部分,这些技术旨在处理在现实世界会议情景中、发言者多位比例高和音频和吵闹状态下进行的多声音对话。首先,为编制数据和加强TS-VAD模型的培训,使用了包含真实会议和模拟室内对话的语音数据。第二,在改进基于TS-VAD解码的结果时,我们采取了一系列后处理步骤,以改进降低分解误差率所需的VAD结果(DERs)。根据ALIMET软件测试,新发行的曼达林会议数据集用于M2MET,我们表明,与基于Eval/Test集的经典集成相比,我们拟议的系统可以将DER值减少至66.555/60.59%。相对而言,与基于Eval/Test集成的经典集成相比,我们提议的系统可以将DR降低到66.55%/60.59%。

相关内容

ICASSP

关注 4

ICASSP是全球最大，最全面的技术会议，重点是信号处理及其应用。会议主题包括但不限于以下主题：音频和声音信号处理、量子信号处理、生物医学信号与图像处理、遥感与信号处理、压缩感知，采样和字典学习、传感器阵列和多通道信号处理、信号处理的设计与实现、大数据信号处理、财务信号处理。官网地址：http://dblp.uni-trier.de/db/conf/icassp/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日