评价问题解答模型的语义回答相似性 (Semantic Answer Similarity for Evaluating Question Answering Models) - 专知论文

会员服务 ·

0

相似度 · MoDELS · 语义相似度 · 自动问答 · SAS ·

2021 年 10 月 13 日

Semantic Answer Similarity for Evaluating Question Answering Models

翻译：评价问题解答模型的语义回答相似性

Julian Risch,Timo Möller,Julian Gutsch,Malte Pietsch

from arxiv, Trained model: https://huggingface.co/deepset/gbert-large-sts

The evaluation of question answering models compares ground-truth annotations with model predictions. However, as of today, this comparison is mostly lexical-based and therefore misses out on answers that have no lexical overlap but are still semantically similar, thus treating correct answers as false. This underestimation of the true performance of models hinders user acceptance in applications and complicates a fair comparison of different models. Therefore, there is a need for an evaluation metric that is based on semantics instead of pure string similarity. In this short paper, we present SAS, a cross-encoder-based metric for the estimation of semantic answer similarity, and compare it to seven existing metrics. To this end, we create an English and a German three-way annotated evaluation dataset containing pairs of answers along with human judgment of their semantic similarity, which we release along with an implementation of the SAS metric and the experiments. We find that semantic similarity metrics based on recent transformer models correlate much better with human judgment than traditional lexical similarity metrics on our two newly created datasets and one dataset from related work.

翻译：问题解答模型的评估将地面真实性说明与模型预测进行比较。然而,截至今天,这一比较大多以词汇为基础,因此遗漏了没有在词汇上重叠、但仍具有语义相似性的答复,因此将正确的答案视为假的。这种对模型真实性表现的低估妨碍了用户在应用中的接受,并使对不同模型的公平比较复杂化。因此,需要基于语义而不是纯字符串相似性的评价性指标。在这份简短的文件中,我们提出SAS,一种基于跨编码的指数,用以估计语义回答相似性,并将它与七个现有的指标进行比较。为此,我们创建了一套英文和德国的附加注释的评价数据集,其中包含了两种答案的配对及其语义相似性的人类判断,我们同时发布了SAS指标和实验。我们发现,基于最近变异模型的语义相似性指标与人类判断的关系要好得多,而不是与我们两个新创建的数据集和一份相关工作中的数据相关的传统类似性指标。

0

相关内容

相似度

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

深度卷积神经网络图像语义分割研究进展

专知会员服务

86+阅读 · 2021年1月7日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

令人心动的offer之微软苏州篇

令人心动的offer之微软苏州篇

微软招聘

4+阅读 · 2020年11月27日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

【文本匹配】Question Answering论文

【文本匹配】Question Answering论文

深度学习自然语言处理

8+阅读 · 2020年4月20日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文浅尝 | CFO: Conditional Focused Neural Question Answering

论文浅尝 | CFO: Conditional Focused Neural Question Answering

开放知识图谱

6+阅读 · 2017年12月15日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Measuring Sentences Similarity: A Survey

Arxiv

7+阅读 · 2019年10月6日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Arxiv

8+阅读 · 2018年4月26日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

语义相似度

相关VIP内容

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

深度卷积神经网络图像语义分割研究进展

专知会员服务

86+阅读 · 2021年1月7日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

令人心动的offer之微软苏州篇

令人心动的offer之微软苏州篇

微软招聘

4+阅读 · 2020年11月27日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

【文本匹配】Question Answering论文

【文本匹配】Question Answering论文

深度学习自然语言处理

8+阅读 · 2020年4月20日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文浅尝 | CFO: Conditional Focused Neural Question Answering

论文浅尝 | CFO: Conditional Focused Neural Question Answering

开放知识图谱

6+阅读 · 2017年12月15日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Measuring Sentences Similarity: A Survey

Arxiv

7+阅读 · 2019年10月6日

Improving Neural Question Generation using Answer Separation

Improving Neural Question Generation using Answer Separation

Arxiv

3+阅读 · 2018年9月7日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Arxiv

8+阅读 · 2018年4月26日

Simple and Effective Semi-Supervised Question Answering

Arxiv

5+阅读 · 2018年4月2日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员