Automatically and accurately identifying user intents and filling the associated slots from their spoken language are critical to the success of dialogue systems. Traditional methods require manually defining the DOMAIN-INTENT-SLOT schema and asking many domain experts to annotate the corresponding utterances, upon which neural models are trained. This procedure brings the challenges of information sharing hindering, out-of-schema, or data sparsity in open-domain dialogue systems. To tackle these challenges, we explore a new task of {\em automatic intent-slot induction} and propose a novel domain-independent tool. That is, we design a coarse-to-fine three-step procedure including Role-labeling, Concept-mining, And Pattern-mining (RCAP): (1) role-labeling: extracting keyphrases from users' utterances and classifying them into a quadruple of coarsely-defined intent-roles via sequence labeling; (2) concept-mining: clustering the extracted intent-role mentions and naming them into abstract fine-grained concepts; (3) pattern-mining: applying the Apriori algorithm to mine intent-role patterns and automatically inferring the intent-slot using these coarse-grained intent-role labels and fine-grained concepts. Empirical evaluations on both real-world in-domain and out-of-domain datasets show that: (1) our RCAP can generate satisfactory SLU schema and outperforms the state-of-the-art supervised learning method; (2) our RCAP can be directly applied to out-of-domain datasets and gain at least 76\% improvement of F1-score on intent detection and 41\% improvement of F1-score on slot filling; (3) our RCAP exhibits its power in generic intent-slot extractions with less manual effort, which opens pathways for schema induction on new domains and unseen intent-slot discovery for generalizable dialogue systems.
翻译:自动准确地识别用户意图,用其口语填充相关空格,这对对话系统的成功至关重要。 传统方法要求手动定义 DOMAIN-INTENT- SLOT 系统,要求许多域专家批注相应的语句,对之进行神经模型培训。 这个程序带来了信息分享的挑战, 阻碍、 失修或数据封闭的开放域对话系统。 为了应对这些挑战, 我们探索了一个新的任务, 即 插入自动意向图示, 并提出了一个全新的域独立工具 。 也就是说, 我们设计了一个直对域三步程序, 包括角色标签、 概念挖掘、 和模式采矿(RCAP) : 从用户的语句中提取关键词句, 将其分类为通过序列标签来淡化的确定意向力。 概念挖掘: 将提取的意向图调, 将它们命名为抽象精细的域校验的三步程。 模式: 将Apriorial- diral- dismoal- disal- labal-deal-deal- listal- ligal- listal- distration- listrevation- sal- lievation- lievation- lievations- listal- lievations.