使用变法推断法确认鲜有射击活动 (Few Shot Activity Recognition Using Variational Inference)

from arxiv, Accepted in IJCAI 2021 - 3RD INTERNATIONAL WORKSHOP ON DEEP LEARNING FOR HUMAN ACTIVITY RECOGNITION. arXiv admin note: text overlap with arXiv:1611.09630, arXiv:1909.07945 by other authors

There has been a remarkable progress in learning a model which could recognise novel classes with only a few labeled examples in the last few years. Few-shot learning (FSL) for action recognition is a challenging task of recognising novel action categories which are represented by few instances in the training data. We propose a novel variational inference based architectural framework (HF-AR) for few shot activity recognition. Our framework leverages volume-preserving Householder Flow to learn a flexible posterior distribution of the novel classes. This results in better performance as compared to state-of-the-art few shot approaches for human activity recognition. approach consists of base model and an adapter model. Our architecture consists of a base model and an adapter model. The base model is trained on seen classes and it computes an embedding that represent the spatial and temporal insights extracted from the input video, e.g. combination of Resnet-152 and LSTM based encoder-decoder model. The adapter model applies a series of Householder transformations to compute a flexible posterior distribution that lends higher accuracy in the few shot approach. Extensive experiments on three well-known datasets: UCF101, HMDB51 and Something-Something-V2, demonstrate similar or better performance on 1-shot and 5-shot classification as compared to state-of-the-art few shot approaches that use only RGB frame sequence as input. To the best of our knowledge, we are the first to explore variational inference along with householder transformations to capture the full rank covariance matrix of posterior distribution, for few shot learning in activity recognition.

翻译：在学习一个能够承认新类的模型方面已经取得了显著的进展,在过去几年中,这些新类可以承认只有几个标签的例子。少发的学习(FSL)对于行动识别是一个艰巨的任务,即承认培训数据中很少的例子所代表的新行动类别。我们提出一个新的基于变式推断的建筑框架(HF-AR),用于很少的射击活动识别。我们的框架利用数量保存家禽流动来学习新类的灵活后传分布。这导致与最先进的、少发的人类活动识别方法相比,业绩更好。方法包括基础模型和一个调整模型。我们的架构包括一个基础模型和一个适配对模型模型。基础模型是用观察的班级和调整模型,它是一个嵌入的嵌入式框架,它代表了从输入视频中提取的时空洞洞洞洞。例如, Resnet-152 和 LSTMTM 组合以编码- decoder 模型为基础。调控模型仅将一系列的家畜级变缩矩阵转换应用于一个灵活的海报分布, 使得在少数试拍方法中具有更高的准确性。在少数试判方法中进行。