The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations, and for formulating invariant visual operations at higher levels. This paper defines and proves a set of joint covariance properties for spatio-temporal receptive fields in terms of spatio-temporal derivative operators applied to spatio-temporally smoothed image data under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations. Specifically, the derived relations show how the parameters of the receptive fields need to be transformed, in order to match the output from spatio-temporal receptive fields under composed spatio-temporal image transformations. For this purpose, we also fundamentally extend the notion of scale-normalized derivatives to affine-normalized derivatives, that are computed based on spatial smoothing with affine Gaussian kernels, and analyze the covariance properties of the resulting affine-normalized derivatives for the affine group as well as for important subgroups thereof. We conclude with a geometric analysis, showing how the derived joint covariance properties make it possible to relate or match spatio-temporal receptive field responses, when observing, possibly moving, local surface patches from different views, under locally linearized perspective or projective transformations, as well as when observing different instances of spatio-temporal events, that may occur either faster or slower between different views of similar spatio-temporal events.
翻译:暂无翻译