Personas are useful for dialogue response prediction. However, the personas used in current studies are pre-defined and hard to obtain before a conversation. To tackle this issue, we study a new task, named Speaker Persona Detection (SPD), which aims to detect speaker personas based on the plain conversational text. In this task, a best-matched persona is searched out from candidates given the conversational text. This is a many-to-many semantic matching task because both contexts and personas in SPD are composed of multiple sentences. The long-term dependency and the dynamic redundancy among these sentences increase the difficulty of this task. We build a dataset for SPD, dubbed as Persona Match on Persona-Chat (PMPC). Furthermore, we evaluate several baseline models and propose utterance-to-profile (U2P) matching networks for this task. The U2P models operate at a fine granularity which treat both contexts and personas as sets of multiple sequences. Then, each sequence pair is scored and an interpretable overall score is obtained for a context-persona pair through aggregation. Evaluation results show that the U2P models outperform their baseline counterparts significantly.
翻译:然而,当前研究中使用的人对于对话响应预测是有用的。 然而,当前研究中使用的人是预先定义的,在对话之前很难获得。 为了解决这个问题,我们研究一项新任务,即名为Peapela Persona Setective(SPD),目的是根据普通的谈话文本来检测演讲人。在这项任务中,根据谈话文本,从候选人中搜索出一个最匹配的人。这是一个多到多的语义匹配任务,因为SPD中的背景和个人都由多个句子组成。这些句子的长期依赖性和动态冗余增加了这项任务的难度。我们为SPD建立了一个数据集,称为人与人之间的匹配(SPD)。此外,我们评估了几个基线模型,并提出了这项任务的直截面(U2P)匹配网络。U2P模型运行在一个精细的颗粒度上,将背景和个人分为多个序列。然后,每对序列进行评分,通过汇总为背景人对一对组合获得可解释的总体评分。