MUGEN: 视频-视频-视频-文字多模式理解和GEN的游戏场 (MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration) - 专知论文

会员服务 ·

0

多峰值 · 可理解性 · INTERACT · 数据集 · 讲稿 ·

2022 年 4 月 20 日

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

翻译：MUGEN: 视频-视频-视频-文字多模式理解和GEN的游戏场

Thomas Hayes,Songyang Zhang,Xi Yin,Guan Pang,Sasha Sheng,Harry Yang,Songwei Ge,Qiyuan Hu,Devi Parikh

Multimodal video-audio-text understanding and generation can benefit from datasets that are narrow but rich. The narrowness allows bite-sized challenges that the research community can make progress on. The richness ensures we are making progress along the core challenges. To this end, we present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun [11]. We made substantial modifications to make the game richer by introducing audio and enabling new interactions. We trained RL agents with different objectives to navigate the game and interact with 13 objects and characters. This allows us to automatically extract a large collection of diverse videos and associated audio. We sample 375K video clips (3.2s each) and collect text descriptions from human annotators. Each video has additional annotations that are extracted automatically from the game engine, such as accurate semantic maps for each frame and templated textual descriptions. Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation. We benchmark representative approaches on tasks involving video-audio-text retrieval and generation. Our dataset and code are released at: https://mugen-org.github.io/.

翻译：多式视频-视频-视频-视频-视频-视频-视频-视频-视频文本理解和生成可受益于狭义但丰富的数据集。狭义允许研究界能够取得进展的细小挑战。丰富能确保我们在核心挑战中取得进展。为此,我们展示了大型视频-视频-视频-文本数据集MUGEN, 使用开放源平台游戏库inRun [11] 收集了大型视频-视频-视频-视频-文本数据集MUGEN。我们做了重大修改,通过引入音频和促成新互动,使游戏更加丰富。我们培训了具有不同目标的RL代理商,以浏览游戏,并与13个对象和字符互动。这使得我们能够自动提取大量不同的视频视频和相关音频视频和相关音频集。我们抽样了375K视频剪辑(各取3.2个),并收集了来自人类标识器的文本描述。每部视频都有从游戏引擎中自动提取的附加说明,例如每个框架的准确语义图和模板文本描述。总体而言,MUGEN能够帮助在多式理解和生成的许多任务中推进研究。我们为涉及视频- 视频- 文本检索和生成的任务设定了有代表性的方法基准。我们的数据设置和生成和代码发布和代码在http://ms/mugeniub.gio.gio.

0

相关内容

多峰值

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

基于双CCD视觉测量系统的多目标、运动光纤位置检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

极地探测机器人多翼帆风力直接驱动研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于行为的SoS体系结构评价研究

国家自然科学基金

1+阅读 · 2012年12月31日

金属凝固过程同步辐射X射线成像用多功能凝固装置

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

7A85铝合金的非等温时效行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

科学传播与基础研究的相互作用研究

国家自然科学基金

0+阅读 · 2011年8月31日

Deep Hierarchical Planning from Pixels

Arxiv

0+阅读 · 2022年6月8日

Structure-Aware Transformer for Graph Representation Learning

Arxiv

1+阅读 · 2022年6月8日

Revisiting Click-based Interactive Video Object Segmentation

Arxiv

0+阅读 · 2022年6月7日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Deep Hierarchical Planning from Pixels

Arxiv

0+阅读 · 2022年6月8日

Structure-Aware Transformer for Graph Representation Learning

Arxiv

1+阅读 · 2022年6月8日

Revisiting Click-based Interactive Video Object Segmentation

Arxiv

0+阅读 · 2022年6月7日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

基于双CCD视觉测量系统的多目标、运动光纤位置检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

极地探测机器人多翼帆风力直接驱动研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于行为的SoS体系结构评价研究

国家自然科学基金

1+阅读 · 2012年12月31日

金属凝固过程同步辐射X射线成像用多功能凝固装置

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

7A85铝合金的非等温时效行为研究

国家自然科学基金

0+阅读 · 2011年12月31日

科学传播与基础研究的相互作用研究

国家自然科学基金

0+阅读 · 2011年8月31日

微信扫码咨询专知VIP会员