PIC: 用于理解和语义搜索的文本中的语句数据集 (PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search) - 专知论文

会员服务 ·

0

Learning · PIC · 模型评估 · 可理解性 · 自动问答 ·

2022 年 7 月 20 日

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

翻译：PIC: 用于理解和语义搜索的文本中的语句数据集

Thang M. Pham,Seunghyun Yoon,Trung Bui,Anh Nguyen

from arxiv, 25 pages, 12 figures

Since BERT (Devlin et al., 2018), learning contextualized word embeddings has been a de-facto standard in NLP. However, the progress of learning contextualized phrase embeddings is hindered by the lack of a human-annotated, phrase-in-context benchmark. To fill this gap, we propose PiC - a dataset of ~28K of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks of increasing difficulty for evaluating the quality of phrase embeddings. We find that training on our dataset improves ranking models' accuracy and remarkably pushes Question Answering (QA) models to near-human accuracy which is 95% Exact Match (EM) on semantic search given a query phrase and a passage. Interestingly, we find evidence that such impressive performance is because the QA models learn to better capture the common meaning of a phrase regardless of its actual context. That is, on our Phrase Sense Disambiguation (PSD) task, SotA model accuracy drops substantially (60% EM), failing to differentiate between two different senses of the same phrase under two different contexts. Further results on our 3-task PiC benchmark reveal that learning contextualized phrase embeddings remains an interesting, open challenge.

翻译：自BERT (Devlin等人,2018年) 以来,学习背景化的字嵌入一直是NLP的脱facto标准。然而,学习背景化的字嵌入的进展受到缺少附加说明的文词嵌入基准的阻碍。为了填补这一空白,我们建议PIC——一个包含点名词的数据集,配以其背景维基百科页面,以及一套在评价词嵌入质量方面日益困难的三项任务。我们发现,关于我们数据集的培训提高了排名模型的准确性,并显著地将问题解答(QA)模型推向接近人的准确性,即:95%的语义搜索Exact Match (EM),配以一个查询短语和一段段落。有趣的是,我们发现,这种令人印象深刻的成绩是,因为QA模型学会更好地捕捉一个短语的共同含义,而不管其实际背景如何。这就是,在我们Phrase Sense diambiggguation(PSD) 任务中, Sota A模型精确度大幅下降 (60 % EM),没有区分两个不同背景化背景背景中学习的版本。

0

相关内容

Learning

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

RanGTPase核质运输系统通过调控AIF核转移介导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

E3泛素连接酶CUL4A在COPD小气道上皮细胞间质转化中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

β2-AR /PCBP2相互作用在胰腺癌发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Iotrochota属海绵及其内生菌中含氮化合物及其抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

PRIMA-1MET诱导p53突变型肺癌细胞自噬的分子机制及其与凋亡的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于海上与室内实验的海面溢油微波散射机理与建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多能级多普勒系统中的Autler-Townes分裂效应以及宏观效应的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肌动蛋白结合蛋白Transgelin在结直肠癌转移中的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

高功率毫米波回旋行波放大器稳定性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder

Arxiv

0+阅读 · 2022年9月16日

A semantic hierarchical graph neural network for text classification

Arxiv

0+阅读 · 2022年9月15日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

相关VIP内容

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder

Arxiv

0+阅读 · 2022年9月16日

A semantic hierarchical graph neural network for text classification

Arxiv

0+阅读 · 2022年9月15日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

RanGTPase核质运输系统通过调控AIF核转移介导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

E3泛素连接酶CUL4A在COPD小气道上皮细胞间质转化中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

β2-AR /PCBP2相互作用在胰腺癌发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Iotrochota属海绵及其内生菌中含氮化合物及其抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

PRIMA-1MET诱导p53突变型肺癌细胞自噬的分子机制及其与凋亡的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于海上与室内实验的海面溢油微波散射机理与建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多能级多普勒系统中的Autler-Townes分裂效应以及宏观效应的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肌动蛋白结合蛋白Transgelin在结直肠癌转移中的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

高功率毫米波回旋行波放大器稳定性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员