Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for speech technologies. Speaker diarization and multi-speaker automatic speech recognition in meeting scenarios have attracted increasing attention. However, the lack of large public real meeting data has been a major obstacle for advancement of the field. Therefore, we release the \emph{AliMeeting} corpus, which consists of 120 hours of real recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by each participants' headset microphone. Moreover, we will launch the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT), as an ICASSP2022 Signal Processing Grand Challenge. The challenge consists of two tracks, namely speaker diarization and multi-speaker ASR. In this paper we provide a detailed introduction of the dateset, rules, evaluation methods and baseline systems, aiming to further promote reproducible research in this field.
翻译:最近发展了语音信号处理,如语音识别、扬声器diarization等,激发了许多语音技术的应用。会议情景是语言技术最宝贵、同时也是最具挑战性的情景之一。在会议情景中,议长的diarization和多声器自动语音识别吸引了越来越多的注意力。然而,缺乏大量公众真实的会议数据是推动该领域发展的主要障碍。因此,我们发布了由120小时实际记录的普通话会议数据构成的集合,其中包括由8个频道麦克风阵列收集的远方数据以及每个参与者的耳机麦克风收集的近地数据。此外,我们将启动多声道多声器多声器会议连接挑战(M2MET),作为ICASSP22022信号处理大挑战。挑战由两条轨道组成,即发言者diarization和多声器 ASR。我们在本文件中详细介绍了日期设置、规则、评价方法和基线系统,目的是进一步促进该领域的可追溯性研究。