QACE: 问问题评价图像说明 (QACE: Asking Questions to Evaluate an Image Caption) - 专知论文

会员服务 ·

0

图像字幕 · 视觉问答 · state-of-the-art · MoDELS · 自动问答 ·

2021 年 8 月 28 日

QACE: Asking Questions to Evaluate an Image Caption

翻译：QACE: 问问题评价图像说明

Hwanhee Lee,Thomas Scialom,Seunghyun Yoon,Franck Dernoncourt,Kyomin Jung

from arxiv, EMNLP 2021 Findings

In this paper, we propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. We first develop QACE-Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE-Img, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE-Img. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE-Img is multi-modal, reference-less, and explainable. Our experiments show that QACE-Img compares favorably w.r.t. other reference-less metrics. We will release the pre-trained models to compute QACE.

翻译：在本文中,我们提议QACE-Img(QACE-Img),这是基于对描述评价的问答的新标准。QACE(QACE)在被评估的标题上产生问题,并通过在参考标题或源图像上提出问题来检查其内容。我们首先开发QACE-Ref(QACE-Ref),将被评估的标题的答案与参考标题的答案进行比较,然后将竞争性结果与最新指标报告。更进一步,我们提议QACE-Img(QACE-Img),直接在图像上而不是在参考上提出问题。不幸的是,对于QACE-Img(QACE-Img)来说,需要视觉QQQA(QACE-Img) 系统,但遗憾的是,标准VQQAA(QQQAA) 模型只被设定为几千个类别。相反,我们提议了一个抽象的VQAAA系统。因此,QACE-Img(QACE-Img) 是多式的,没有参考和可解释的。我们的实验显示QACE(QACE-Img) 比较优于W.r.r.t.t.

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

专知会员服务

36+阅读 · 2020年9月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

极市平台

4+阅读 · 2019年1月20日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

7+阅读 · 2018年4月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Semantic Answer Similarity for Evaluating Question Answering Models

Arxiv

0+阅读 · 2021年10月21日

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Arxiv

0+阅读 · 2021年10月19日

MuSiQue: Multi-hop Questions via Single-hop Question Composition

Arxiv

0+阅读 · 2021年10月16日

IQ-VQA: Intelligent Visual Question Answering

IQ-VQA: Intelligent Visual Question Answering

Arxiv

5+阅读 · 2020年7月8日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Commonsense for Generative Multi-Hop Question Answering Tasks

Arxiv

4+阅读 · 2018年9月17日

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Arxiv

8+阅读 · 2018年8月29日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

知识驱动的视觉知识学习，以VQA视觉问答为例，31页ppt

专知会员服务

36+阅读 · 2020年9月25日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

【CV+NLP】更有智慧的眼睛：图像描述（Image Caption）&视觉问答（VQA）综述（上）

极市平台

4+阅读 · 2019年1月20日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

7+阅读 · 2018年4月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Semantic Answer Similarity for Evaluating Question Answering Models

Arxiv

0+阅读 · 2021年10月21日

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Arxiv

0+阅读 · 2021年10月19日

MuSiQue: Multi-hop Questions via Single-hop Question Composition

Arxiv

0+阅读 · 2021年10月16日

IQ-VQA: Intelligent Visual Question Answering

IQ-VQA: Intelligent Visual Question Answering

Arxiv

5+阅读 · 2020年7月8日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Commonsense for Generative Multi-Hop Question Answering Tasks

Arxiv

4+阅读 · 2018年9月17日

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Arxiv

8+阅读 · 2018年8月29日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员