Harvesting question-answer (QA) pairs from customer service chatlog in the wild is an efficient way to enrich the knowledge base for customer service chatbots in the cold start or continuous integration scenarios. Prior work attempts to obtain 1-to-1 QA pairs from growing customer service chatlog, which fails to integrate the incomplete utterances from the dialog context for composite QA retrieval. In this paper, we propose N-to-N QA extraction task in which the derived questions and corresponding answers might be separated across different utterances. We introduce a suite of generative/discriminative tagging based methods with end-to-end and two-stage variants that perform well on 5 customer service datasets and for the first time setup a benchmark for N-to-N DialogQAE with utterance and session level evaluation metrics. With a deep dive into extracted QA pairs, we find that the relations between and inside the QA pairs can be indicators to analyze the dialogue structure, e.g. information seeking, clarification, barge-in and elaboration. We also show that the proposed models can adapt to different domains and languages, and reduce the labor cost of knowledge accumulation in the real-world product dialogue platform.
翻译:从野生客户服务聊天中获取答答(QA)对与野生客户服务聊天对对,是丰富在寒冷开始或连续整合情况下客户服务聊天机知识库的有效方法。先前曾试图从不断增长的客户服务聊天中获取一至一对QA配对,这未能将对话背景中不完整的言论纳入综合质量A检索。在本文中,我们提议N-N QA提取任务,由此产生的问题和相应的答案可以在不同语句中区分开来。我们推出一套基于基因化/差异的标记方法,包括端到端和两阶段的变式,在5个客户服务数据集上运行良好,首次为N-N DialogQAE设定一个基准,加上话语句和会话级评价尺度。在对提取的QA对配对的深度下,我们发现QA配对之间和内部的关系可以成为分析对话结构的指标,例如,信息寻求、澄清、驳进和阐述。我们还表明,在现实世界对话平台上,拟议的模型可以降低不同领域和产品知识的积累。