MPMQA：产品手册上的多模态问答 (MPMQA: Multimodal Question Answering on Product Manuals) - 专知论文

会员服务 ·

0

多模 · 多模态 · 模态 · 问答 · 产品 ·

2023 年 4 月 19 日

MPMQA: Multimodal Question Answering on Product Manuals

翻译：MPMQA：产品手册上的多模态问答

Liang Zhang,Anwen Hu,Jing Zhang,Shuo Hu,Qin Jin

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.

翻译：摘要：视觉内容，如插图和图像，在产品手册理解中起着重要作用。现有的产品手册问答（PMQA）数据集往往忽略视觉内容，仅保留文本部分。在本文中，为了强调多模态内容的重要性，我们提出了一个多模态产品手册问答（MPMQA）任务。对于每个问题，MPMQA要求模型不仅处理多模态内容，还要提供多模态答案。为支持MPMQA，我们使用人工注释构建了一个大型数据集PM209，其中包含来自27个知名消费电子品牌的209个产品手册。人工注释包括手册内容的6种语义区域和22,021对问题和答案。特别地，每个答案由一句文本句子和相关的手册视觉区域组成。考虑到产品手册的长度以及问题总是与少量页面相关，MPMQA可以自然地分为两个子任务：检索最相关的页面，然后生成多模态答案。我们进一步提出了一个统一模型，可以同时执行这两个子任务，并实现多个任务特定模型的可比性能。PM209数据集可在https://github.com/AIM3-RUC/MPMQA上获得。

0

相关内容

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

专知会员服务

37+阅读 · 2020年1月10日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

基于MgO/NaClO2的燃煤工业锅炉同时脱硫脱硝反应特性及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

基于多模态MRI的神经节苷酯对鼻咽癌放射性脑损伤早期干预疗效的研究

国家自然科学基金

0+阅读 · 2013年12月31日

组合矩阵论中的秩问题

国家自然科学基金

1+阅读 · 2013年12月31日

小鼠中来源于mRNA的piRNA的生成机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于元数据语义的地理空间数据关联方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

与变分法有关的非线性椭圆型方程及方程组问题

国家自然科学基金

0+阅读 · 2011年12月31日

肺癌发生中的低氧微环境关键调控蛋白研究

国家自然科学基金

0+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

Arxiv

0+阅读 · 2023年6月5日

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Arxiv

0+阅读 · 2023年6月2日

Prototyping the use of Large Language Models (LLMs) for adult learning content creation at scale

Arxiv

0+阅读 · 2023年6月2日

TimelineQA: A Benchmark for Question Answering over Timelines

Arxiv

0+阅读 · 2023年6月1日

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Arxiv

0+阅读 · 2023年6月1日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

VIP会员

文章信息

相关主题

相关VIP内容

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

近期必读的7篇 CVPR 2019【视觉问答】相关论文和代码

专知会员服务

37+阅读 · 2020年1月10日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

Arxiv

0+阅读 · 2023年6月5日

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Arxiv

0+阅读 · 2023年6月2日

Prototyping the use of Large Language Models (LLMs) for adult learning content creation at scale

Arxiv

0+阅读 · 2023年6月2日

TimelineQA: A Benchmark for Question Answering over Timelines

Arxiv

0+阅读 · 2023年6月1日

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Arxiv

0+阅读 · 2023年6月1日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Arxiv

10+阅读 · 2020年12月31日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

相关基金

基于MgO/NaClO2的燃煤工业锅炉同时脱硫脱硝反应特性及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

基于多模态MRI的神经节苷酯对鼻咽癌放射性脑损伤早期干预疗效的研究

国家自然科学基金

0+阅读 · 2013年12月31日

组合矩阵论中的秩问题

国家自然科学基金

1+阅读 · 2013年12月31日

小鼠中来源于mRNA的piRNA的生成机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于元数据语义的地理空间数据关联方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

图的双临猜想及相关的着色问题

国家自然科学基金

0+阅读 · 2011年12月31日

与变分法有关的非线性椭圆型方程及方程组问题

国家自然科学基金

0+阅读 · 2011年12月31日

肺癌发生中的低氧微环境关键调控蛋白研究

国家自然科学基金

0+阅读 · 2011年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员