Maria: 视觉体验有说服力的对话代理人 (Maria: A Visual Experience Powered Conversational Agent) - 专知论文

会员服务 ·

0

会话智能体 · Extensibility · INFORMS · 多峰值 · state-of-the-art ·

2021 年 6 月 23 日

Maria: A Visual Experience Powered Conversational Agent

翻译：Maria: 视觉体验有说服力的对话代理人

Zujie Liang,Huang Hu,Can Xu,Chongyang Tao,Xiubo Geng,Yining Chen,Fan Liang,Daxin Jiang

from arxiv, Accepted by ACL 2021 main conference

Arguably, the visual perception of conversational agents to the physical world is a key way for them to exhibit the human-like intelligence. Image-grounded conversation is thus proposed to address this challenge. Existing works focus on exploring the multimodal dialog models that ground the conversation on a given image. In this paper, we take a step further to study image-grounded conversation under a fully open-ended setting where no paired dialog and image are assumed available. Specifically, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator. The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image. Then, the response generator is grounded on the extracted visual knowledge and dialog context to generate the target response. Extensive experiments demonstrate Maria outperforms previous state-of-the-art methods on automatic metrics and human evaluation, and can generate informative responses that have some visual commonsense of the physical world.

翻译：可以说,对物理世界的谈话代理人的视觉感知是他们展示像人一样的智能的关键方式。因此,建议通过基于图像的对话来应对这一挑战。现有的工作重点是探索以特定图像为对话基础的多式对话模式。在本文中,我们进一步在完全开放的环境下研究以图像为基础的对话,在这个环境中没有配对的对话框和图像的假设。具体地说, 我们介绍Maria, 由视觉世界经验驱动的神经对话代理人, 从大型图像索引中提取出来。 Maria 由三种灵活的组件组成, 即文字到图像检索器、视觉概念探测器和视觉- 知识- 地面反应生成器。检索器的目的是从图像索引中检索与对话相关的图像, 而视觉概念探测器则从图像中提取丰富的视觉知识。然后, 反应器以提取的视觉知识和对话背景为基础, 以产生目标响应。广泛的实验显示, Maria 超越了先前在自动计量仪和人类评价方面的状态- 方法, 并且能够产生一些具有共见识的视觉反应。

0

相关内容

会话智能体

会话智能体

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知会员服务

71+阅读 · 2020年3月28日

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

专知会员服务

26+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

Can a Humorous Conversational Agent Enhance Learning Experience and Outcomes?

Arxiv

0+阅读 · 2021年8月25日

Towards Coherent Visual Storytelling with Ordered Image Attention

Arxiv

0+阅读 · 2021年8月22日

Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs

Arxiv

0+阅读 · 2021年8月20日

Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation

Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation

Arxiv

4+阅读 · 2020年12月9日

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Arxiv

11+阅读 · 2020年7月8日

Conversational Machine Comprehension: a Literature Review

Arxiv

3+阅读 · 2020年6月1日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

会话智能体

state-of-the-art

相关VIP内容

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知会员服务

71+阅读 · 2020年3月28日

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

【目标跟踪 | 2019最新综述】视觉跟踪器的回顾及其在移动机器人中的应用分析，附25页PDF，174篇参考文献，A Review of Visual Trackers and Analysis of its Application to Mobile Robot

专知会员服务

26+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

相关论文

Can a Humorous Conversational Agent Enhance Learning Experience and Outcomes?

Arxiv

0+阅读 · 2021年8月25日

Towards Coherent Visual Storytelling with Ordered Image Attention

Arxiv

0+阅读 · 2021年8月22日

Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs

Arxiv

0+阅读 · 2021年8月20日

Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation

Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network for Emotional Conversation Generation

Arxiv

4+阅读 · 2020年12月9日

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Arxiv

11+阅读 · 2020年7月8日

Conversational Machine Comprehension: a Literature Review

Arxiv

3+阅读 · 2020年6月1日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Arxiv

8+阅读 · 2018年3月29日

微信扫码咨询专知VIP会员