RUST: 无姿态图像中的潜在神经场景表示 (RUST: Latent Neural Scene Representations from Unposed Imagery) - 专知论文

会员服务 ·

0

场景表示 · 神经场 · 潜在 · Rust · 表示 ·

2023 年 3 月 24 日

RUST: Latent Neural Scene Representations from Unposed Imagery

翻译：RUST: 无姿态图像中的潜在神经场景表示

Mehdi S. M. Sajjadi,Aravindh Mahendran,Thomas Kipf,Etienne Pot,Daniel Duckworth,Mario Lucic,Klaus Greff

from arxiv, CVPR 2023 Highlight. Project website: https://rust-paper.github.io/

Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.

翻译：从2D观测中推断3D场景的结构是计算机视觉中的一个基本挑战。最近的基于神经场景表示的方法已经取得了巨大的影响，并已应用于各种应用。这个领域中仍然存在的一个主要挑战是训练一个单一的模型，它能够提供有效地泛化到单一场景之外的潜在表示。场景表示变换器（SRT）在这个方向上显示出了前景，但将其扩展到更大的不同场景集是具有挑战性的，并且需要准确姿态的地面事实数据。为了解决这个问题，我们提出了RUST（真正的非姿态场景表示变换器），这是一种仅依靠RGB图像进行训练的新视角合成的非姿态方法。我们的主要见解是，可以训练一个姿态编码器，该编码器查看目标图像并学习一个用于视角合成的潜在姿态嵌入。我们对学习到的潜在姿态结构进行了实证研究，并表明它允许有意义的测试时间相机变换和准确的显式姿态读数。也许令人惊讶的是，RUST实现了与具有完美相机姿态访问权限的方法类似的质量，从而释放了大规模训练摊销神经场景表示的潜力。

0

相关内容

场景表示

【CVPR2023】高保真自由可控的说话头视频生成

【CVPR2023】高保真自由可控的说话头视频生成

专知会员服务

21+阅读 · 2023年4月22日

如何生成复杂逼真3D场景？CVPR2023英伟达等提出《分层潜在扩散模型》生成复杂的开放世界3D场景

如何生成复杂逼真3D场景？CVPR2023英伟达等提出《分层潜在扩散模型》生成复杂的开放世界3D场景

专知会员服务

48+阅读 · 2023年4月20日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

24+阅读 · 2020年4月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

量子Toroidal代数的表示、应用及推广

国家自然科学基金

1+阅读 · 2015年12月31日

基于视频图像处理的神经导航空间配准方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

地理场景协同的多摄像机目标跟踪模型

国家自然科学基金

1+阅读 · 2013年12月31日

几何与图像计算中的变分方法与算法

国家自然科学基金

0+阅读 · 2013年12月31日

高时空分辨率的动态对象几何与运动信息获取方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于多维模糊数的多通道不确定信息的表示及融合方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向实时逼真绘制的非传统相机模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Arxiv

0+阅读 · 2023年5月16日

Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

Arxiv

0+阅读 · 2023年5月15日

A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation: Current State, Limitations and Prospects

Arxiv

0+阅读 · 2023年5月12日

Model-based Programming: Redefining the Atomic Unit of Programming for the Deep Learning Era

Arxiv

0+阅读 · 2023年5月12日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Scene Graph Generation: A Comprehensive Survey

Arxiv

26+阅读 · 2022年1月3日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】高保真自由可控的说话头视频生成

【CVPR2023】高保真自由可控的说话头视频生成

专知会员服务

21+阅读 · 2023年4月22日

如何生成复杂逼真3D场景？CVPR2023英伟达等提出《分层潜在扩散模型》生成复杂的开放世界3D场景

如何生成复杂逼真3D场景？CVPR2023英伟达等提出《分层潜在扩散模型》生成复杂的开放世界3D场景

专知会员服务

48+阅读 · 2023年4月20日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

24+阅读 · 2020年4月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

《生态建模密码破译：建模与编程实践》美陆军最新报告

《战争形态演变：合成兵种防御主导模式探析》48页slides

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Arxiv

0+阅读 · 2023年5月16日

Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text

Arxiv

0+阅读 · 2023年5月15日

A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation: Current State, Limitations and Prospects

Arxiv

0+阅读 · 2023年5月12日

Model-based Programming: Redefining the Atomic Unit of Programming for the Deep Learning Era

Arxiv

0+阅读 · 2023年5月12日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Scene Graph Generation: A Comprehensive Survey

Arxiv

26+阅读 · 2022年1月3日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

量子Toroidal代数的表示、应用及推广

国家自然科学基金

1+阅读 · 2015年12月31日

基于视频图像处理的神经导航空间配准方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

地理场景协同的多摄像机目标跟踪模型

国家自然科学基金

1+阅读 · 2013年12月31日

几何与图像计算中的变分方法与算法

国家自然科学基金

0+阅读 · 2013年12月31日

高时空分辨率的动态对象几何与运动信息获取方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于多维模糊数的多通道不确定信息的表示及融合方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向实时逼真绘制的非传统相机模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员