In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
翻译:在本文中,我们引入了一套具有挑战性的新数据集,即MLB-YouTube,用于精细测活动检测。该数据集包含两个设置:分层视频分类以及连续视频中的活动检测。我们实验性地比较了在活动视频中捕捉时间结构的各种识别方法,对分层视频进行分类并将这些方法扩大到连续视频。我们还比较了在广播棒球视频中预测投球速度和投球类型这一极为困难的任务的模型。我们发现学习时间结构对于精细测活动识别很有价值。