In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process as training complex models over large datasets are expensive and time-consuming. Such a problem is even more evident when dealing with video analysis. Some works have considered transfer learning or domain adaptation, i.e., approaches that map the knowledge from one domain to another, to ease the training burden, yet most of them operate over individual or small blocks of frames. This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model, denoted as Spectral Deep Belief Network. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process. The experimental results conducted over two public video dataset, the HMDB-51 and the UCF-101, depict the effectiveness of the proposed model and its reduced computational burden when compared to traditional energy-based models, such as Restricted Boltzmann Machines and Deep Belief Networks.
翻译:在过去十年中,指数数据增长提供了基于机械学习的算法能力,并使这些算法得以用于日常生活活动。此外,由于深层次的学习技术的出现,即简单的建筑堆积成堆,最终形成更为复杂的模型。虽然这两个因素都产生了突出的结果,但它们也对学习过程产生了缺点,因为大型数据集的培训复杂模型既昂贵又耗时。在处理视频分析时,这种问题更为明显。有些工作考虑了将学习或域域适应,即将知识从一个领域映射到另一个领域,以减轻培训负担,但大多数这些方法都是在个别或小块框架上操作的。本文提出一种新的方法,用一种基于能源的模式,将知识从行动识别到事件识别,称为光谱深信网络。这样的模型可以同时处理所有框架,通过学习过程将空间和时间信息传送到空间和时间信息中。在两个公共视频数据集,即HMDB-51和UCF-101上进行的实验结果,描述了拟议的模型的有效性,在与传统能源基础的模型相比,其深度和深度和深度计算能力网络时,将深度和低度的机载力计算负担描绘出模型。