In conversational settings, individuals exhibit unique behaviors, rendering a one-size-fits-all approach insufficient for generating responses by dialogue agents. Although past studies have aimed to create personalized dialogue agents using speaker persona information, they have relied on the assumption that the speaker's persona is already provided. However, this assumption is not always valid, especially when it comes to chatbots utilized in industries like banking, hotel reservations, and airline bookings. This research paper aims to fill this gap by exploring the task of Speaker Profiling in Conversations (SPC). The primary objective of SPC is to produce a summary of persona characteristics for each individual speaker present in a dialogue. To accomplish this, we have divided the task into three subtasks: persona discovery, persona-type identification, and persona-value extraction. Given a dialogue, the first subtask aims to identify all utterances that contain persona information. Subsequently, the second task evaluates these utterances to identify the type of persona information they contain, while the third subtask identifies the specific persona values for each identified type. To address the task of SPC, we have curated a new dataset named SPICE, which comes with specific labels. We have evaluated various baselines on this dataset and benchmarked it with a new neural model, SPOT, which we introduce in this paper. Furthermore, we present a comprehensive analysis of SPOT, examining the limitations of individual modules both quantitatively and qualitatively.
翻译:在对话设置中,个体表现出独特的行为,使得单一的方式无法生成对话代理的回复。尽管过去的研究旨在利用说话人个人信息创建个性化的对话代理,但它们依赖于说话者的假设人设已经提供。然而,这个假设并不总是有效的,特别是当涉及到在银行、酒店预订和航空公司预定等行业中使用的聊天机器人时,这种方法无法否定实际了解个人与个人之间的会话以产生个性化的回应。本研究旨在填补这一空白,研究对话中的说话者个人信息识别。SPC的主要目标是为对话中的每个个人说话者产生一个人设信息摘要。为了达到这个目的,我们把任务分为三个子任务:人设发现、人设类型识别和人设值提取。给定一个对话,第一子任务旨在识别包含个人信息的所有话语。随后,第二个任务评估这些话语以确定它们包含的人设信息类型,而第三个子任务则确定每个确定的类型的人设值。为了解决SPC任务,我们收集了一个名为SPICE的新数据集,该数据集配有特定的标签。我们在这个数据集上评估了各种基线,并使用我们在本文中介绍的新神经模型SPOT进行基准测试。此外,我们对SPOT进行了全面的分析,定量和定性地考察了各模块的限制。