Human visual recognition of activities or external agents involves an interplay between high-level plan recognition and low-level perception. Given that, a natural question to ask is: can low-level perception be improved by high-level plan recognition? We formulate the problem of leveraging recognized plans to generate better top-down attention maps \cite{gazzaniga2009,baluch2011} to improve the perception performance. We call these top-down attention maps specifically as plan-recognition-driven attention maps. To address this problem, we introduce the Pixel Dynamics Network. Pixel Dynamics Network serves as an observation model, which predicts next states of object points at each pixel location given observation of pixels and pixel-level action feature. This is like internally learning a pixel-level dynamics model. Pixel Dynamics Network is a kind of Convolutional Neural Network (ConvNet), with specially-designed architecture. Therefore, Pixel Dynamics Network could take the advantage of parallel computation of ConvNets, while learning the pixel-level dynamics model. We further prove the equivalence between Pixel Dynamics Network as an observation model, and the belief update in partially observable Markov decision process (POMDP) framework. We evaluate our Pixel Dynamics Network in event recognition tasks. We build an event recognition system, ER-PRN, which takes Pixel Dynamics Network as a subroutine, to recognize events based on observations augmented by plan-recognition-driven attention.
翻译:人类对活动或外部媒介的视觉认知涉及高层次计划识别和低层次感知之间的相互作用。 鉴于这个自然的问题, 自然要问的问题是: 高层次计划识别能改善低层次感知吗? 我们提出如何利用公认的计划来生成更好的自上而下注意力映射 \ cite{gazzarniga2009,baluch2011} 来改善感知性能。 我们将这些自上而下注意力映射图称为具体作为计划认知驱动的注意力映射图。 为了解决这个问题, 我们引入了像素动态网络。 像素动态网络是一个观测模型, 我们进一步证明了Pix Riologle定位定位定位站位置的下一个目标点状态, 我们进一步证明了 Pixel Dirview 定位网络的等同性, 我们的动态动态动态网络是一个动态认知模型, 我们的定位网络是一个动态动态认知框架, 我们的动态定位网络是一个动态动态认知框架, 我们的动态动态识别网络是一个部分的动态动态动态识别任务。