This paper introduces a model for incomplete utterance restoration (IUR). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation of the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings. The code will be also available (https://github.com/shumpei19/jointiur).
翻译:本文介绍了一个不完整话语恢复模型( IUR ) 。 不同于以前只对提取或抽象数据集起作用的研究, 我们设计了一个简单而有效的模型, 适用于 IUR 两种情景。 我们的设计模拟了 IUR 的性质, 环境背景中遗漏的符号有助于恢复。 我们从中建一个 Picker, 识别遗漏的符号。 为支持采集者, 我们设计了两种标签创建方法( 软标签和硬标签), 在没有遗漏符号的情况下可以使用。 修复的方法是在Picker的帮助下, 在联合学习中使用一个发电机。 在提取和抽象情景中四个基准数据集的预期结果显示我们的模型比在丰富和有限的培训数据设置中培训前的T5和非遗传语言模型方法要好。 代码也将可用( https://github. com/shumpe19/ wointiur ) 。