关于对话的研究结果 (Findings on Conversation Disentanglement)

Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization. Framing it as a utterance-to-utterance classification problem -- i.e. given an utterance of interest (UOI), find which past utterance it replies to -- we explore a number of transformer-based models and found that BERT in combination with handcrafted features remains a strong baseline. We then build a multi-task learning model that jointly learns utterance-to-utterance and utterance-to-thread classification. Observing that the ground truth label (past utterance) is in the top candidates when our model makes an error, we experiment with using bipartite graphs as a post-processing step to learn how to best match a set of UOIs to past utterances. Experiments on the Ubuntu IRC dataset show that this approach has the potential to outperform the conventional greedy approach of simply selecting the highest probability candidate for each UOI independently, indicating a promising future research direction.

翻译：对话解脱是确定对话中不同线条的任务,是多党对话NLP应用程序中一个重要的预处理步骤,例如对口问答和对口话总结。把它作为表达至排泄的分类问题来划分,也就是说,考虑到一个感兴趣的表达(UOI),发现它过去回答的表达方式 -- -- 我们探索了一些基于变压器的模型,发现BERT与手工制作的特征相结合,仍然是一个强有力的基线。然后我们建立一个多任务学习模式,共同学习发音到发音和发音到读的分类。观察发现地面真相标签(past Tistance)在我们模型出错时处于最高候选者之列,我们实验用双片图作为后处理步骤,学习如何最好地匹配一组UOIs到过去发音。在Ubuntu IRC数据集上进行的实验表明,这种方法有可能超越光为每个UOI独立选择最高概率候选者的传统贪婪方法,表明未来研究方向有希望。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/