When interacting in a three dimensional world, humans must estimate 3D structure from visual inputs projected down to two dimensional retinal images. It has been shown that humans use the persistence of object shape over motion-induced transformations as a cue to resolve depth ambiguity when solving this underconstrained problem. With the aim of understanding how biological vision systems may internally represent 3D transformations, we propose a computational model, based on a generative manifold model, which can be used to infer 3D structure from the motion of 2D points. Our model can also learn representations of the transformations with minimal supervision, providing a proof of concept for how humans may develop internal representations on a developmental or evolutionary time scale. Focused on rotational motion, we show how our model infers depth from moving 2D projected points, learns 3D rotational transformations from 2D training stimuli, and compares to human performance on psychophysical structure-from-motion experiments.
翻译:在三维世界中交互时,人类必须从投影到二维视网膜图像中估计三维结构。研究表明,当解决这个不受限制的问题时,人类使用对象形状在运动引起的变换上的持续性作为线索来解决深度模糊问题。旨在理解生物视觉系统如何内部表示三维变换,我们提出了一个基于生成流形模型的计算模型,可用于从2D点的运动中推断3D结构。我们的模型还可以在最小监督下学习变换的表示,为人类如何在发育或进化的时间尺度上发展内部表示提供了概念性证明。专注于旋转运动,我们展示了我们的模型如何从移动的2D投影点中推断深度,从2D训练刺激中学习3D旋转变换,并与人类在心理物理结构运动实验上的表现进行比较。