Depth foundation models offer strong learned priors for 3D perception but lack physical depth cues, leading to ambiguities in metric scale. We introduce a birefringent metalens -- a planar nanophotonic lens composed of subwavelength pixels for wavefront shaping with a thickness of 700 nm and a diameter of 3 mm -- to physically prompt depth foundation models. In a single monocular shot, our metalens physically embeds depth information into two polarized optical wavefronts, which we decode through a lightweight prompting and fine-tuning framework that aligns depth foundation models with the optical signals. To scale the training data, we develop a light wave propagation simulator that synthesizes metalens responses from RGB-D datasets, incorporating key physical factors to minimize the sim-to-real gap. Simulated and physical experiments with our fabricated titanium-dioxide metalens demonstrate accurate and consistent metric depth over state-of-the-art monocular depth estimators. The research demonstrates that nanophotonic wavefront formation offers a promising bridge for grounding depth foundation models in physical depth sensing.
翻译:深度基础模型为三维感知提供了强大的学习先验,但缺乏物理深度线索,导致度量尺度存在模糊性。我们引入了一种双折射超透镜——一种由亚波长像素组成的平面纳米光子透镜,用于波前整形,厚度为700纳米,直径为3毫米——以物理方式提示深度基础模型。在单次单目拍摄中,我们的超透镜将深度信息物理嵌入到两个偏振光学波前中,并通过轻量级提示与微调框架解码这些信息,该框架将深度基础模型与光学信号对齐。为扩展训练数据,我们开发了一个光波传播模拟器,从RGB-D数据集中合成超透镜响应,并纳入关键物理因素以最小化仿真与现实的差距。使用我们制造的二氧化钛超透镜进行的仿真与物理实验表明,其度量深度估计在准确性和一致性上优于当前最先进的单目深度估计器。该研究证明,纳米光子波前形成为深度基础模型在物理深度感知中的基础化提供了有前景的桥梁。