The world is composed of objects, the ground, and the sky. Visual perception of objects requires solving two fundamental challenges: segmenting visual input into discrete units, and tracking identities of these units despite appearance changes due to object deformation, changing perspective, and dynamic occlusion. Current computer vision approaches to segmentation and tracking that approach human performance all require learning, raising the question: can objects be segmented and tracked without learning? Here, we show that the mathematical structure of light rays reflected from environment surfaces yields a natural representation of persistent surfaces, and this surface representation provides a solution to both the segmentation and tracking problems. We describe how to generate this surface representation from continuous visual input, and demonstrate that our approach can segment and invariantly track objects in cluttered synthetic video despite severe appearance changes, without requiring learning.
翻译:世界是由天体、地面和天空组成的。对天体的视觉认识需要解决两个基本挑战:将视觉输入分解成离散的单元,并跟踪这些单元的身份,尽管由于物体变形、视角变化和动态隔离而出现外观变化。当前计算机对分解和跟踪人类性能的计算机观察方法都需要学习,这提出了这样一个问题:从环境表面反射的光线的数学结构自然代表了持久性表面,而这种表层表层的表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表因连续的视觉表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层图的图案不需要学习。我们描述如何用连续的视觉输入图层图层图层图层表来生成这种表层表层表层表层表层表层表层表层表,并显示我们的方法可以分解和无变的合成图层图象对象,尽管有严重的外观变化,而不需要学习。