ReMoDiffuse: 基于检索的运动扩散模型 (ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model) - 专知论文

会员服务 ·

0

运动生成 · 扩散模型 · 人体运动 · 混合 · state-of-the-art ·

2023 年 4 月 3 日

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

翻译：ReMoDiffuse: 基于检索的运动扩散模型

Mingyuan Zhang,Xinying Guo,Liang Pan,Zhongang Cai,Fangzhou Hong,Huirong Li,Lei Yang,Ziwei Liu

3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.

翻译：三维人体运动生成对于创意产业至关重要。最近的进展依赖于具有领域知识的生成模型，用于驱动文本生成运动，从而在捕捉常见运动方面取得了实质性的进展。然而，在更多样化运动方面的表现仍然令人不满意。在这项工作中，我们提出了 ReMoDiffuse，一种基于扩散模型的运动生成框架，该框架集成了一个检索机制来改善去噪过程。ReMoDiffuse 通过三个关键设计增强了文本驱动的运动生成的通用性和多样性: 1) 混合检索根据语义和动力学相似性从数据库中找到适当的参考。2) 语义调制变压器有选择地吸收检索知识，适应检索样本与目标运动序列之间的差异。 3) 条件混合在推理过程中更好地利用检索数据库，克服了无分类器指导下的尺度灵敏度。广泛的实验表明，ReMoDiffuse 在平衡文本-运动一致性和运动质量方面优于现有的 state-of-the-art 方法，特别是对于更多样化的运动生成。

0

相关内容

运动生成

视频自监督学习综述

视频自监督学习综述

专知会员服务

53+阅读 · 2022年7月5日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

靶向代谢型谷氨酸受体8变构调节位点的3DShapeSim药物发现与镇痛作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白质相互作用及结合位点的预测方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

激酶底物筛选新方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路Lats1/2激酶的底物筛选及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

舰船大型空间复杂曲面切割变胞机构及其特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Levy过程驱动的随机Fast-Diffusion方程的Harnack不等式及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

不同亚型注意缺陷多动障碍儿童的多模态脑磁共振研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

Arxiv

0+阅读 · 2023年5月25日

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models

Arxiv

0+阅读 · 2023年5月25日

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

Arxiv

0+阅读 · 2023年5月25日

iPlanner: Imperative Path Planning

Arxiv

0+阅读 · 2023年5月24日

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Arxiv

0+阅读 · 2023年5月24日

Query Rewriting for Retrieval-Augmented Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Arxiv

1+阅读 · 2023年5月23日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

视频自监督学习综述

视频自监督学习综述

专知会员服务

53+阅读 · 2022年7月5日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《科研智能：人工智能赋能工业仿真研究报告（2025年）》

具身智能中的世界模型：全面综述

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【博士论文】用于排序与扩散模型的安全、高效与鲁棒强化学习

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

Arxiv

0+阅读 · 2023年5月25日

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models

Arxiv

0+阅读 · 2023年5月25日

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

Arxiv

0+阅读 · 2023年5月25日

iPlanner: Imperative Path Planning

Arxiv

0+阅读 · 2023年5月24日

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Arxiv

0+阅读 · 2023年5月24日

Query Rewriting for Retrieval-Augmented Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Arxiv

1+阅读 · 2023年5月23日

相关基金

靶向代谢型谷氨酸受体8变构调节位点的3DShapeSim药物发现与镇痛作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白质相互作用及结合位点的预测方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

激酶底物筛选新方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hippo信号传导通路Lats1/2激酶的底物筛选及功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

舰船大型空间复杂曲面切割变胞机构及其特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Levy过程驱动的随机Fast-Diffusion方程的Harnack不等式及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

不同亚型注意缺陷多动障碍儿童的多模态脑磁共振研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员