Videos are a rich source of high-dimensional structured data, with a wide range of interacting components at varying levels of granularity. In order to improve understanding of unconstrained internet videos, it is important to consider the role of labels at separate levels of abstraction. In this paper, we consider the use of the Bidirectional Inference Neural Network (BINN) for performing graph-based inference in label space for the task of video classification. We take advantage of the inherent hierarchy between labels at increasing granularity. The BINN is evaluated on the first and second release of the YouTube-8M large scale multilabel video dataset. Our results demonstrate the effectiveness of BINN, achieving significant improvements against baseline models.
翻译:视频是高维结构化数据的丰富来源,具有不同颗粒层次的广泛互动组件。为了增进对不受限制的互联网视频的了解,重要的是考虑不同抽象层次标签的作用。在本文中,我们考虑使用双向推断神经网络(BINN)在标签空间中进行基于图形的推断,以完成视频分类任务。我们利用在粒子增加时标签之间的内在等级。BINN是在YouTube-8M大型多标签视频数据集的第一和第二版上进行评估的。我们的结果显示了BINN的有效性,在基线模型上取得了显著改进。