To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
翻译:为了便利对文本生成的研究,本文件介绍了一个全面和统一的图书馆,即TextBox 2.0, 重点是使用预先培训的语言模型(PLMs)。为了做到全面,我们的图书馆涵盖13美元的共同文本生成任务及其相应的830美元数据集,并进一步纳入了45美元PLMs,涵盖一般、翻译、中文、对话、可控制、蒸馏、促动和轻量级的PLMs。我们还实施了4美元的高效培训战略,并为从零开始培训新的PLMs提供了4美元的生成目标。为了统一,我们设计了界面,以支持整个研究管道(从数据装载到培训和评估),确保每个步骤都能以统一的方式完成。尽管功能丰富,但我们很容易使用我们的图书馆,或者通过友好的 Python API 或指挥线。为了验证我们的图书馆的有效性,我们进行了广泛的实验,并展示了四种类型的研究情景。该项目在链接上发布: https://github.com/ROCIBox/TextBox。