Peering, a side-to-side motion used by animals to estimate distance through motion parallax, offers a powerful bio-inspired strategy to overcome a fundamental limitation in robotic vision: partial occlusion. Conventional robot cameras, with their small apertures and large depth of field, render both foreground obstacles and background objects in sharp focus, causing occluders to obscure critical scene information. This work establishes a formal connection between animal peering and synthetic aperture (SA) sensing from optical imaging. By having a robot execute a peering motion, its camera describes a wide synthetic aperture. Computational integration of the captured images synthesizes an image with an extremely shallow depth of field, effectively blurring out occluding elements while bringing the background into sharp focus. This efficient, wavelength-independent technique enables real-time, high-resolution perception across various spectral bands. We demonstrate that this approach not only restores basic scene understanding but also empowers advanced visual reasoning in large multimodal models, which fail with conventionally occluded imagery. Unlike feature-dependent multi-view 3D vision methods or active sensors like LiDAR, SA sensing via peering is robust to occlusion, computationally efficient, and immediately deployable on any mobile robot. This research bridges animal behavior and robotics, suggesting that peering motions for synthetic aperture sensing are a key to advanced scene understanding in complex, cluttered environments.
翻译:窥视(peering)是动物通过运动视差估计距离时采用的左右摆动行为,为克服机器人视觉中的一个根本局限——部分遮挡——提供了一种强大的仿生策略。传统机器人相机因孔径小、景深大,会使前景障碍物与背景物体均清晰成像,导致遮挡物掩盖关键场景信息。本研究建立了动物窥视行为与光学成像中合成孔径(SA)传感之间的形式化关联。通过让机器人执行窥视运动,其相机轨迹构成一个宽合成孔径。对捕获图像进行计算集成后,可合成出具有极浅景深的图像,有效模糊遮挡元素同时使背景清晰聚焦。这种高效且与波长无关的技术,可在不同光谱波段实现实时高分辨率感知。我们证明该方法不仅能恢复基础场景理解,还能增强大型多模态模型中的高级视觉推理能力——这些模型在处理传统遮挡图像时会失效。与依赖特征的多视角三维视觉方法或激光雷达等主动传感器不同,通过窥视实现的合成孔径传感对遮挡具有鲁棒性,计算效率高,且可立即部署于任何移动机器人。本研究连接了动物行为与机器人学,表明用于合成孔径传感的窥视运动是在复杂杂乱环境中实现高级场景理解的关键。