MRR和NDCG视觉对话框模型组合 (Ensemble of MRR and NDCG models for Visual Dialog) - 专知论文

会员服务 ·

0

Performer · 秩 · state-of-the-art · MoDELS · 多词一义性 ·

2021 年 8 月 4 日

Ensemble of MRR and NDCG models for Visual Dialog

翻译：MRR和NDCG视觉对话框模型组合

from arxiv, Accepted to NAACL2021

Assessing an AI agent that can converse in human language and understand visual content is challenging. Generation metrics, such as BLEU scores favor correct syntax over semantics. Hence a discriminative approach is often used, where an agent ranks a set of candidate options. The mean reciprocal rank (MRR) metric evaluates the model performance by taking into account the rank of a single human-derived answer. This approach, however, raises a new challenge: the ambiguity and synonymy of answers, for instance, semantic equivalence (e.g., `yeah' and `yes'). To address this, the normalized discounted cumulative gain (NDCG) metric has been used to capture the relevance of all the correct answers via dense annotations. However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know. Crafting a model that excels on both MRR and NDCG metrics is challenging. Ideally, an AI agent should answer a human-like reply and validate the correctness of any answer. To address this issue, we describe a two-step non-parametric ranking approach that can merge strong MRR and NDCG models. Using our approach, we manage to keep most MRR state-of-the-art performance (70.41% vs. 71.24%) and the NDCG state-of-the-art performance (72.16% vs. 75.35%). Moreover, our approach won the recent Visual Dialog 2020 challenge. Source code is available at https://github.com/idansc/mrr-ndcg.

翻译：评估一个能够以人类语言进行交流并理解视觉内容的AI 代理机构是具有挑战性的。代际指标, 如 BLEU 评分偏重正确的语法而不是语义学。因此, 常常使用一种歧视性的方法, 一种代理机构排行一套候选选项。平均对等等级( MRR) 衡量标准通过考虑单一人类答案的等级来评估模型的性能。但是, 这种方法提出了新的挑战: 答案的模糊性和同义性, 例如语义等同性( 例如, `ye' 和“ yes ” ) 。为了解决这个问题, 通常使用一种常规的折现累积收益( NDCG) 衡量标准来通过密集的描述所有正确答案的相关性。然而, NDCGGG 衡量标准赞成通常适用的不确定的答案, 如“ 我不知道” 。构建一个既优于 mRC 也优于 NDC 度的模型。理想的情况是, AI 一种像人一样的解答方法, 并验证任何答案的正确性。为了解决这个问题, 我们描述了最近一步的NCRD( 20 ) 最接近的不差的成绩排序方法。

0

相关内容

Performer

多样性文本生成任务的研究进展

专知会员服务

42+阅读 · 2021年4月23日

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

84+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

125+阅读 · 2020年11月20日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

70+阅读 · 2020年7月28日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

89+阅读 · 2020年7月9日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

64+阅读 · 2020年5月12日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

25+阅读 · 2020年5月5日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

145+阅读 · 2020年4月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

【基础】集成学习（Ensemble Learning）

【基础】集成学习（Ensemble Learning）

深度学习自然语言处理

4+阅读 · 2020年2月7日

已删除

将门创投

7+阅读 · 2019年10月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning

Arxiv

3+阅读 · 2021年2月3日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Long-Term Visual Object Tracking Benchmark

Arxiv

3+阅读 · 2018年3月22日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

VIP会员

文章信息

相关主题

state-of-the-art

多词一义性

相关VIP内容

多样性文本生成任务的研究进展

专知会员服务

42+阅读 · 2021年4月23日

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

84+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

125+阅读 · 2020年11月20日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

70+阅读 · 2020年7月28日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

89+阅读 · 2020年7月9日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

64+阅读 · 2020年5月12日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

25+阅读 · 2020年5月5日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

145+阅读 · 2020年4月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

热门VIP内容

相关资讯

【基础】集成学习（Ensemble Learning）

【基础】集成学习（Ensemble Learning）

深度学习自然语言处理

4+阅读 · 2020年2月7日

已删除

将门创投

7+阅读 · 2019年10月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

相关论文

Semantic Grouping Network for Video Captioning

Semantic Grouping Network for Video Captioning

Arxiv

3+阅读 · 2021年2月3日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Long-Term Visual Object Tracking Benchmark

Arxiv

3+阅读 · 2018年3月22日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

微信扫码咨询专知VIP会员