Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience. The mobile phone prototype emits pulsed audio, while recording video for RGB-based 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, VR demo, and experimental results from real-world and virtual scenes with challenging surfaces and sound indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable visual and audio improvement in 3D scenes (see Figure 1).
翻译:窗体、镜像和墙壁等反射和无纹表面的表面,如窗体、镜像和墙壁等,可能成为物体和场面重建的挑战。这些表面的重建往往不善,而且充满了深度不连续和洞洞,因此难以以一致的方式重建含有这些平板不连续的场景。我们提议了Echorebuilt,这是一个视听方法,利用声音的反射,帮助为虚拟会议、遥视和其他AR/VR经验进行声学和音频重建,进行几何和音频重建。移动电话原型发出脉冲音频,同时为基于 RGB 的 3D 重建和视听分类录下视频。视频的反射声音和图像被输入到我们的音频(EchoCN-A)和视听(EchoCNN-AV) 中,难以以一致的方式重建含有这些平面和声波源检测、深度估计的声波重建。这些分类的推论通过深度过滤、平面、平面图像的原样、VR demo化和实验性结果,显示在现实和图像的高度的深度上,以及地面上,显示具有挑战性的表面和视觉的图像的成像的图像的成像率和图像的成功率和图像,显示了相当的成像和图像的成。