The definition generation task aims to generate a word's definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.
翻译:定义生成任务的目的是在特定背景下自动生成单词定义。然而,由于缺乏不同复杂程度的数据集,模型产生的定义往往保持同样的复杂程度。本文件提出了为具有可控复杂程度的单词生成定义的新任务。相应的,我们引入了Compiling,一个关于中国定义的详细信息数据集,每个定义都有其复杂程度的标签。Compiling数据集包括74,303个单词和106,882个定义。据我们所知,它是中国定义生成任务中最大的数据集。我们选择了各种具有代表性的生成方法作为这项任务的基线并进行评估,这表明我们的数据集在协助模型生成不同复杂程度定义方面发挥了杰出的作用。我们认为,Compling数据集将有利于对复杂可控定义生成的进一步研究。