生成无界三维场景的SceneDreamer：基于二维图像集合 (SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections) - 专知论文

会员服务 ·

0

三维场景 · 场景表示 · 参数化 · 表示 · 噪声 ·

2023 年 4 月 19 日

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

翻译：生成无界三维场景的SceneDreamer：基于二维图像集合

Zhaoxi Chen,Guangcong Wang,Ziwei Liu

from arxiv, Project Page https://scene-dreamer.github.io/ Code https://github.com/FrozenBurning/SceneDreamer

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

翻译：在本文中，我们提出了SceneDreamer，这是一种无条件的生成模型，用于从随机噪声中合成大规模的三维景观。我们的框架仅从野外二维图像集合中学习，并且没有任何三维注释。在SceneDreamer的核心是一个有原则的学习范例，包括1）一个高效而表达丰富的三维场景表示法，2）一种生成式场景参数化和3）一种有效的渲染器，可以利用来自二维图像的知识。我们的方法始于一个由Simplex噪声生成的高效的鸟瞰图（BEV）表示，其中包括了一个表面高度的高度场和一个详细的场景语义的语义场。这个BEV场景表示使得1）可以用二次复杂度表示3D场景，2）具有解缴的几何形状和语义，以及3）高效的训练。此外，我们提出了一种新的生成式神经哈希网格，来基于3D位置和场景语义来参数化潜在空间，旨在对各种场景进行编码匹配。最后，通过对抗训练学习的神经体积渲染器被用来产生逼真的图像。大量实验表明，SceneDreamer的有效性以及在生成生动而多样化的无界三维场景方面，优于现有的各种方法。

1

相关内容

三维场景

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】从大量非正式视频中构建可动画的3D神经模型，BANMo: Building Animatable 3D Neural Models from Many Casual Videos

【CVPR 2022】从大量非正式视频中构建可动画的3D神经模型，BANMo: Building Animatable 3D Neural Models from Many Casual Videos

专知会员服务

25+阅读 · 2022年3月3日

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

15+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

24+阅读 · 2020年4月4日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

7 Papers & Radios | NeurIPS'22获奖论文；英伟达一句话生成3D模型

7 Papers & Radios | NeurIPS'22获奖论文；英伟达一句话生成3D模型

机器之心

0+阅读 · 2022年11月27日

【泡泡一分钟】用于视角可变重定位的语义地图构建

【泡泡一分钟】用于视角可变重定位的语义地图构建

泡泡机器人SLAM

19+阅读 · 2019年10月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

三维场景中基于空间方向关系的混合索引结构研究

国家自然科学基金

0+阅读 · 2015年12月31日

乐安河-鄱阳湖段典型环境胁迫下河岸带植物的生态适应性与响应机制定量研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于语义分割与理解的室外场景三维重建研究

国家自然科学基金

0+阅读 · 2013年12月31日

图像与视频矢量化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于Voronoi图的动态虚拟场景可见性计算方法

国家自然科学基金

0+阅读 · 2010年12月31日

Introducing Depth into Transformer-based 3D Object Detection

Arxiv

0+阅读 · 2023年6月5日

Object as Query: Lifting any 2D Object Detector to 3D Detection

Arxiv

0+阅读 · 2023年6月5日

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

Arxiv

0+阅读 · 2023年6月2日

Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping

Arxiv

0+阅读 · 2023年6月2日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

41+阅读 · 2023年4月19日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】从大量非正式视频中构建可动画的3D神经模型，BANMo: Building Animatable 3D Neural Models from Many Casual Videos

【CVPR 2022】从大量非正式视频中构建可动画的3D神经模型，BANMo: Building Animatable 3D Neural Models from Many Casual Videos

专知会员服务

25+阅读 · 2022年3月3日

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

15+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

24+阅读 · 2020年4月4日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

7 Papers & Radios | NeurIPS'22获奖论文；英伟达一句话生成3D模型

7 Papers & Radios | NeurIPS'22获奖论文；英伟达一句话生成3D模型

机器之心

0+阅读 · 2022年11月27日

【泡泡一分钟】用于视角可变重定位的语义地图构建

【泡泡一分钟】用于视角可变重定位的语义地图构建

泡泡机器人SLAM

19+阅读 · 2019年10月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Introducing Depth into Transformer-based 3D Object Detection

Arxiv

0+阅读 · 2023年6月5日

Object as Query: Lifting any 2D Object Detector to 3D Detection

Arxiv

0+阅读 · 2023年6月5日

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

Arxiv

0+阅读 · 2023年6月2日

Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping

Arxiv

0+阅读 · 2023年6月2日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

41+阅读 · 2023年4月19日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Arxiv

11+阅读 · 2021年12月16日

Deep Learning on Image Denoising: An overview

Arxiv

13+阅读 · 2020年8月3日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

三维场景中基于空间方向关系的混合索引结构研究

国家自然科学基金

0+阅读 · 2015年12月31日

乐安河-鄱阳湖段典型环境胁迫下河岸带植物的生态适应性与响应机制定量研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于语义分割与理解的室外场景三维重建研究

国家自然科学基金

0+阅读 · 2013年12月31日

图像与视频矢量化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于Voronoi图的动态虚拟场景可见性计算方法

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员