Active stereo cameras that recover depth from structured light captures have become a cornerstone sensor modality for 3D scene reconstruction and understanding tasks across application domains. Existing active stereo cameras project a pseudo-random dot pattern on object surfaces to extract disparity independently of object texture. Such hand-crafted patterns are designed in isolation from the scene statistics, ambient illumination conditions, and the reconstruction method. In this work, we propose the first method to jointly learn structured illumination and reconstruction, parameterized by a diffractive optical element and a neural network, in an end-to-end fashion. To this end, we introduce a novel differentiable image formation model for active stereo, relying on both wave and geometric optics, and a novel trinocular reconstruction network. The jointly optimized pattern, which we dub "Polka Lines," together with the reconstruction network, achieve state-of-the-art active-stereo depth estimates across imaging conditions. We validate the proposed method in simulation and on a hardware prototype, and show that our method outperforms existing active stereo systems.
翻译:从结构化的光捕捉中恢复深度的活性立体摄像机已成为三维场景重建和理解各种应用领域任务的基石传感器模式。 现有的主动立体摄像机在物体表面投放一种假随机点图案, 以分离物体纹理来分离差异。 这种手工制作的图案设计脱离了现场统计、 环境光化条件 和重建方法。 在这项工作中, 我们提出了第一个联合学习结构化照明和重建的方法, 以分辨光学元件和神经网络为参数, 以最终到终端的方式进行参数。 为此, 我们为活跃立体安装了一个新的不同图像形成模型, 以波和几何光学为根据, 以及一个新的三角重建网络。 共同优化的图案, 我们与重建网络一起将“ Polka Lines”, 实现全局性活性立体深度估计, 我们验证了模拟和硬件原型中的拟议方法, 并显示我们的方法优于现有的主动立体立体系统。