In human conversations, ellipsis and coreference are commonly occurring linguistic phenomena. Although these phenomena are a mean of making human-machine conversations more fluent and natural, only few dialogue corpora contain explicit indications on which turns contain ellipses and/or coreferences. In this paper we address the task of automatically detecting ellipsis and coreferences in conversational question answering. We propose to use a multi-label classifier based on DistilBERT. Multi-label classification and active learning are employed to compensate the limited amount of labeled data. We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset.
翻译:虽然这些现象是使人体机器对话更加流畅和自然的一种手段,但只有很少的对话体含有关于旋转时含有省略和(或)共同参考的清晰指示。在本文件中,我们处理在谈话回答中自动发现省略和共同参考的任务。我们提议使用基于dittilBERT的多标签分类器。多标签分类和积极学习被用来补偿数量有限的标签数据。我们表明,这些方法极大地提高了分类员在人工标签数据集中检测这些现象的性能。