Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization.
翻译:视觉行动识别的机床学习模式通常根据特定情况下与某些物体相关行动的数据进行培训和测试,这是个未决问题,培训组合中的行动对象协会如何影响模型在经过培训的情况之外推广的能力。我们着手确定培训数据的特点,从而形成具有更广泛化能力的行动识别模式。为此,我们从一个称为跨环境学习的认知机制中得到启发,该机制指出,人类学习者通过观察不同情况中相同概念的事例来获取概念的含义。我们与各类行动对象协会进行了有控制的实验,并确定了导致更好分类的培训数据中行动对象共同出现的关键特性。鉴于这些特性在通常用于培训计算机视觉文献中的行动分类人员的数据集中缺失,我们的工作提供了有用的见解,说明我们应该如何最好地构建数据集,以便进行高效的培训,更好地普及。