Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets. However, annotating large-scale video datasets are cost-intensive. In this work, we propose a time-efficient video annotation method using spatio-temporal feature similarity and t-SNE dimensionality reduction to speed up the annotation process massively. Placing the same actions from different videos near each other in the two-dimensional space based on feature similarity helps the annotator to group-label video clips. We evaluate our method on two subsets of the ActivityNet (v1.3) and a subset of the Sports-1M dataset. We show that t-EVA can outperform other video annotation tools while maintaining test accuracy on video classification.
翻译:过去几年来,由于提供了若干大型视频数据集,对视频的了解受到更多关注,然而,对大型视频数据集的说明是成本密集型的。在这项工作中,我们建议采用时间高效的视频批注方法,使用时空特征相似性和t-SNE的维度降低,以大大加快批注过程。根据特征相似性将不同视频在二维空间的相同动作相近,有助于批注者使用集体标签视频剪辑。我们评估了我们关于活动网两个子集(V1.3)和体育-1M数据集一个子集的方法。我们显示,t-EVA可以超越其他视频批注工具,同时保持视频分类的测试精度。