NeKo：基于任务引导专家混合语言模型的跨模态识别后纠错 (NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model)

Yen-Ting Lin,Zhehuai Chen,Piotr Zelasko,Zhen Wan,Xuesong Yang,Zih-Ching Chen,Krishna C Puvvada,Szu-Wei Fu,Ke Hu,Jun Wei Chiu,Jagadeesh Balam,Boris Ginsburg,Yu-Chiang Frank Wang,Chao-Han Huck Yang

from arxiv, ACL 2025 Industry Track. NeKo LMs: https://huggingface.co/nvidia/NeKo-v0-post-correction

Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to 27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.

翻译：构建通用识别后纠错器面临一个关键问题：如何在大规模混合领域数据集上最有效地训练模型？答案在于学习数据集特定特征，并将其知识整合到单一模型中。先前方法通过使用独立的纠错语言模型实现这一点，导致参数量显著增加。本研究提出采用专家混合模型作为解决方案，强调MoE不仅是可扩展性工具。我们提出一种多任务纠错MoE，通过训练专家学习将每个数据集的令牌路由至其映射的专家，使其成为语音转文本、语言转文本和视觉转文本数据集的“专家”。在Open ASR Leaderboard上的实验表明，我们探索了新的最先进性能，实现了平均相对5.0%的词错误率降低，并在语音和翻译任务的BLEU分数上取得显著提升。在零样本评估中，NeKo在Hyporadise基准测试中以15.5%至27.6%的相对词错误率降低优于GPT-3.5和Claude-Opus。作为多任务模型，NeKo在语法纠错和OCR后纠错任务中表现出竞争力。