For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper, we present insights on the second of these shards (V3C2) and discuss their implications for research areas, such as video retrieval, for which the dataset might be particularly useful. We also provide all the extracted data in order to simplify the use of the dataset.
翻译:为使研究成果具有可比性,必须拥有用于实验和评估的通用数据集,但这类数据集的大小可能阻碍其使用。维米奥创意公域集(V3C)是一个视频数据集,旨在代表网上发现的视频内容,总共包含大约3800小时的视频,分为三块碎片。我们在本文件中介绍了关于第二个碎片的洞察力(V3C2),并讨论了其对诸如视频检索等研究领域的影响,对于这些领域而言,数据集可能特别有用。我们还提供了所有提取的数据,以简化数据集的使用。