This paper introduces a model for incomplete utterance restoration (IUR) called JET (\textbf{J}oint learning token \textbf{E}xtraction and \textbf{T}ext generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.\footnote{The code is available at \url{https://github.com/shumpei19/JET}}
翻译:本文引入了一个不完整话语恢复模型( IUR), 称为 JET (\ textbf{ J} 学习符号 \ textbf{ E}tracription 和\ textbf{T} 版本生成) 。 我们设计了一个简单而有效的模型, 适用于 IUR 的两种假想。 我们的设计模拟了 IUR 的性质, 从上下文中遗漏的符号有助于恢复。 从此, 我们构建了一个拾取器, 来识别遗漏的符号。 为了支持拾取器, 我们设计了两种标签创建方法( 软标签和硬标签), 它可以在没有批注数据的情况下用于遗漏符号。 恢复的方法是在Picker的帮助下, 使用一个生成器来联合学习 。 在提取和抽象假想中四个基准数据集的推广结果显示, 我们的模型比在丰富和有限的培训数据环境中经过预先训练的 T5 和非基因语言模型方法要好。\ foot { 代码可以在 http:// gurl/ gith/ github.