建设与抽取问题解答变形器的 " 真实世界对话公司 " 的内在景观 (Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers) - 专知论文

会员服务 ·

0

可理解性 · 簇 · Automator · MoDELS · Taxonomy ·

2022 年 8 月 30 日

Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers

翻译：建设与抽取问题解答变形器的 " 真实世界对话公司 " 的内在景观

Jean-Philippe Corbeil,Mia Taige Li,Hadi Abdi Ghavidel

For companies with customer service, mapping intents inside their conversational data is crucial in building applications based on natural language understanding (NLU). Nevertheless, there is no established automated technique to gather the intents from noisy online chats or voice transcripts. Simple clustering approaches are not suited to intent-sparse dialogues. To solve this intent-landscape task, we propose an unsupervised pipeline that extracts the intents and the taxonomy of intents from real-world dialogues. Our pipeline mines intent-span candidates with an extractive Question-Answering Electra model and leverages sentence embeddings to apply a low-level density clustering followed by a top-level hierarchical clustering. Our results demonstrate the generalization ability of an ELECTRA large model fine-tuned on the SQuAD2 dataset to understand dialogues. With the right prompting question, this model achieves a rate of linguistic validation on intent spans beyond 85%. We furthermore reconstructed the intent schemes of five domains from the MultiDoGo dataset with an average recall of 94.3%.

翻译：对于有客户服务的公司来说,在建立基于自然语言理解(NLU)的应用程序时,对谈话数据中的意图进行绘图至关重要。然而,没有固定的自动化技术来收集来自吵闹的在线聊天或语音记录誊本的意向。简单的集群方法不适合意向扭曲的对话。为了解决这一意向景观任务,我们建议建立一个不受监督的管道,从现实世界对话中提取意图和意图分类。我们的管道式地雷意图分布候选人,其开采式问题解答 Electra模型和杠杆句嵌入,以应用低密度集群,然后采用最高等级的等级集群。我们的结果表明,在SQuAD2数据集上对ELECTRA大型模型进行精细调整以了解对话的通用能力。在正确的问题下,该模型在意向方面实现了语言验证率超过85%。我们还从多多多戈多功能数据集中将五个域的意向方案重新组合,平均回收了94.3%。

0

相关内容

可理解性

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于功能磁共振在体评价直肠癌胶原含量—与太赫兹离体谱分析及病理对照研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

气固非催化反应中固体产物介尺度结构的形成与生长

国家自然科学基金

0+阅读 · 2013年12月31日

基于查询词级联关系的高阶信息检索问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋大气边界层结构和三维风场的多普勒激光雷达观测研究

国家自然科学基金

0+阅读 · 2012年12月31日

复合稀土层状氢氧化物的可控合成、剥离及透明荧光取向膜的纳米片组装与光学特性

国家自然科学基金

0+阅读 · 2011年12月31日

温度和凋落物品质对土壤有机质分解过程中的碳同位素分馏影响

国家自然科学基金

0+阅读 · 2009年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

湖泊水体光学特性及遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Zero-Shot Ranking Socio-Political Texts with Transformer Language Models to Reduce Close Reading Time

Arxiv

0+阅读 · 2022年10月17日

A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction

Arxiv

0+阅读 · 2022年10月17日

Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer

Arxiv

0+阅读 · 2022年10月17日

Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding

Arxiv

0+阅读 · 2022年10月16日

TLDW: Extreme Multimodal Summarisation of News Videos

Arxiv

0+阅读 · 2022年10月16日

FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue

Arxiv

0+阅读 · 2022年10月13日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Zero-Shot Ranking Socio-Political Texts with Transformer Language Models to Reduce Close Reading Time

Arxiv

0+阅读 · 2022年10月17日

A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction

Arxiv

0+阅读 · 2022年10月17日

Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer

Arxiv

0+阅读 · 2022年10月17日

Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding

Arxiv

0+阅读 · 2022年10月16日

TLDW: Extreme Multimodal Summarisation of News Videos

Arxiv

0+阅读 · 2022年10月16日

FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue

Arxiv

0+阅读 · 2022年10月13日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

基于功能磁共振在体评价直肠癌胶原含量—与太赫兹离体谱分析及病理对照研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

气固非催化反应中固体产物介尺度结构的形成与生长

国家自然科学基金

0+阅读 · 2013年12月31日

基于查询词级联关系的高阶信息检索问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋大气边界层结构和三维风场的多普勒激光雷达观测研究

国家自然科学基金

0+阅读 · 2012年12月31日

复合稀土层状氢氧化物的可控合成、剥离及透明荧光取向膜的纳米片组装与光学特性

国家自然科学基金

0+阅读 · 2011年12月31日

温度和凋落物品质对土壤有机质分解过程中的碳同位素分馏影响

国家自然科学基金

0+阅读 · 2009年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

湖泊水体光学特性及遥感监测机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员