Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
翻译:在真实世界动态场景中,通过随意单向视频代表并合成新观点是一个长期存在的问题。 现有的解决方案通常通过应用几何技术或使用几个相邻框之间的时间信息,而不考虑整个场景的背景分布或射线层面的传输,限制其在静态和隔离区的性能。 我们的方法$\ textbf{D}$sultbf{D}$D}在真实世界动态场景中代表并合成新观点是一个长期问题。 每个光亮场域都提供高质量的视图合成和3D解决方案, 用于$\ textbf{D} $3D 。 我们评估了整个 $\ textbf{D} $D} 的动态场景的背景背景背景, 称之为$\text{D}4$NERF 。 具体地说,它使用一个线性表示来捕捉静态背景场景分布在静态场景中的分布分布分布, 6DentimeRRF 分别代表动态对象。 每一个光谱样本都有额外的隐含量量显示静态和动态组成部分中的传输。 我们在公共动态场景中评估$DDNerefroprefrodeal 的图像中, 将展示我们先前的图像中获取的图像。