Zero-shot action recognition is the task of recognizing action classes without visual examples, only with a semantic embedding which relates unseen to seen classes. The problem can be seen as learning a function which generalizes well to instances of unseen classes without losing discrimination between classes. Neural networks can model the complex boundaries between visual classes, which explains their success as supervised models. However, in zero-shot learning, these highly specialized class boundaries may not transfer well from seen to unseen classes. In this paper, we propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually. We optimize the clustering using Reinforcement Learning which we show is critical for our approach to work. We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets, UCF101, HMDB51, and Olympic Sports; both in the standard zero-shot evaluation and the generalized zero-shot learning.
翻译:零点行动识别是承认行动班的任务,没有视觉实例,只有语义嵌入,与被观察班相联。问题可以被视为学习一种功能,它能很好地概括到不可见的班级,而不会在班级之间造成歧视。神经网络可以模拟视觉班之间的复杂界限,从而解释其作为监督模式的成功。然而,在零点学习中,这些高度专业化班级的界限可能不会很好地从被观察到的班级转移到不可见班级。在本文中,我们提出一个基于集群的模式,即一次性考虑所有培训样本,而不是对每个案例进行优化。我们用强化学习来优化集合,这对我们的工作方式至关重要。我们叫了拟议的CLASTER方法,并观察到它在所有标准数据集、UCF101、HMDB51和奥林匹克体育中不断改进最新技术;在标准零点评估和普遍零点学习中都是这样。