MUG: 一个通用的会议理解和生成基准 (MUG: A General Meeting Understanding and Generation Benchmark)

Listening to long video/audio recordings from video conferencing and online courses for acquiring information is extremely inefficient. Even after ASR systems transcribe recordings into long-form spoken language documents, reading ASR transcripts only partly speeds up seeking information. It has been observed that a range of NLP applications, such as keyphrase extraction, topic segmentation, and summarization, significantly improve users' efficiency in grasping important information. The meeting scenario is among the most valuable scenarios for deploying these spoken language processing (SLP) capabilities. However, the lack of large-scale public meeting datasets annotated for these SLP tasks severely hinders their advancement. To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection. To facilitate the MUG benchmark, we construct and release a large-scale meeting dataset for comprehensive long-form SLP development, the AliMeeting4MUG Corpus, which consists of 654 recorded Mandarin meeting sessions with diverse topic coverage, with manual annotations for SLP tasks on manual transcripts of meeting recordings. To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance.

翻译：长时间观看视频会议和在线课程的录音以获取信息效率极低。即使自动语音识别（ASR）系统将录音转录为长篇口语文档，阅读ASR转录只能在一定程度上加快查找信息的速度。已经观察到，许多自然语言处理应用，如关键短语提取、主题分割和摘要，显着提高了用户理解重要信息的效率。会议场景是部署这些口语处理（SLP）能力的最有价值的场景之一。然而，缺乏大规模公开的会议数据集来注释这些SLP任务严重阻碍了它们的进展。为了推动SLP发展，我们建立了一个大规模的通用会议理解和生成基准（MUG），以基准测试广泛的SLP任务的性能，包括主题分割、主题级别和会话级别的抽取摘要和主题标题生成、关键短语提取和行动项检测。为了便于MUG基准测试，我们构建并发布一个大规模会议数据集以支持全面的长篇SLP开发，AliMeeting4MUG语料库，其中包括654个多样化主题覆盖的中文会议会话记录，并根据会议记录的手动转录进行SLP任务的手动注释。据我们所知，AliMeeting4MUG Corpus目前是规模最大的会议语料库，并可用于大多数SLP任务。在本文中，我们提供了有关该语料库、SLP任务和评估方法、基线系统及其性能的详细介绍。