UMFuse: 统一的多视图融合人体编辑应用 (UMFuse: Unified Multi View Fusion for Human Editing applications) - 专知论文

会员服务 ·

0

多视图 · 融合 · 潜在 · 融合网络 · 多源图像 ·

2023 年 3 月 28 日

UMFuse: Unified Multi View Fusion for Human Editing applications

翻译：UMFuse: 统一的多视图融合人体编辑应用

Rishabh Jain,Mayur Hemani,Duygu Ceylan,Krishna Kumar Singh,Jingwan Lu,Mausoom Sarkar,Balaji Krishnamurthy

from arxiv, 8 pages, 6 figures

Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications. However, most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. This objective becomes ill-defined in cases when the target pose differs significantly from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse knowledge from multiple viewpoints, we design a multi-view fusion network that takes the pose key points and texture from multiple source images and generates an explainable per-pixel appearance retrieval map. Thereafter, the encodings from a separate network (trained on a single-view human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on two newly proposed tasks - Multi-view human reposing and Mix&Match Human Image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a better alternative.

翻译：众多姿势引导的人体编辑方法已被视觉领域探索，因为它们具有广泛的实际应用。然而，大多数这些方法仍然使用单个图像作为输入，以产生编辑后的图像作为输出的图像到图像公式。当目标姿势与输入姿势差别显著时，此目标变得不明确。现有的方法通过修补或样式传输来处理遮挡并保留内容。在本文中，我们探讨了利用多个视角的可能性，以最小化缺失信息的问题并生成潜在人类模型的准确表示。为了融合来自多视角的知识，我们设计了一个多视图融合网络，它获取多源图像中的姿势关键点和纹理，并生成可解释的每像素外观检索映射。然后，从单视图重定位任务训练的单独网络的编码在潜在空间中进行合并。这使我们能够为不同的编辑任务生成准确、精确和视觉上一致的图像。我们在两个新提出的任务上展示了我们的网络应用——多视图人体重定位和混合匹配人类图像生成。此外，我们研究了单视图编辑的局限性以及多视图提供更好替代方案的情况。

0

相关内容

多视图

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

基于逆行调控皮层振荡的深部脑刺激作用机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

医学图像范例先验构造与虚拟多模态成像方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于同步多双目立体视觉的高精度人体建模

国家自然科学基金

0+阅读 · 2014年12月31日

基于全向深度视觉的高精度人体肢体运动三维重建研究

国家自然科学基金

0+阅读 · 2014年12月31日

遥感立体、多视图影像压缩研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的不规则波浪建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率隐变量模型和深度信息的人体运动与体形估计研究

国家自然科学基金

1+阅读 · 2012年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月18日

LDM3D: Latent Diffusion Model for 3D

Arxiv

0+阅读 · 2023年5月18日

Controllable Mind Visual Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Arxiv

0+阅读 · 2023年5月17日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Arxiv

78+阅读 · 2019年11月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月18日

LDM3D: Latent Diffusion Model for 3D

Arxiv

0+阅读 · 2023年5月18日

Controllable Mind Visual Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Arxiv

0+阅读 · 2023年5月17日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Arxiv

78+阅读 · 2019年11月10日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

基于逆行调控皮层振荡的深部脑刺激作用机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

医学图像范例先验构造与虚拟多模态成像方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于同步多双目立体视觉的高精度人体建模

国家自然科学基金

0+阅读 · 2014年12月31日

基于全向深度视觉的高精度人体肢体运动三维重建研究

国家自然科学基金

0+阅读 · 2014年12月31日

遥感立体、多视图影像压缩研究

国家自然科学基金

0+阅读 · 2012年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的不规则波浪建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率隐变量模型和深度信息的人体运动与体形估计研究

国家自然科学基金

1+阅读 · 2012年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员