In this paper, we release an open-source library, called TextBox, to provide a unified, modularized, and extensible text generation framework. TextBox aims to support a broad set of text generation tasks and models. In our library, we implement 21 text generation models on 9 benchmark datasets, covering the categories of VAE, GAN, and pretrained language models. Meanwhile, our library maintains sufficient modularity and extensibility by properly decomposing the model architecture, inference, and learning process into highly reusable modules, which allows users to easily incorporate new models into our framework. The above features make TextBox specially suitable for researchers and practitioners to quickly reproduce baseline models and develop new models. TextBox is implemented based on PyTorch, and released under Apache License 2.0 at https://github.com/RUCAIBox/TextBox.
翻译:在本文中,我们发行了一个名为TextBox的开放源码图书馆,以提供一个统一的、模块化的和可扩展的文本生成框架;TextBox旨在支持一套广泛的文本生成任务和模型;在我们图书馆中,我们在9个基准数据集上实施了21个文本生成模型,涵盖VAE、GAN和预先培训的语言模型等类别;与此同时,我们的图书馆保持了足够的模块性和可扩展性,将模型结构、推断和学习过程适当地分解为高度可重复使用的模块,使用户能够很容易地将新模型纳入我们的框架。上述特征使TextBox特别适合研究人员和从业人员快速复制基线模型和开发新模型。TextBox是在PyTorrch的基础上实施的,并在https://github.com/RUCAIBox/TextBox的Apachelict 2.0下发布。