Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, the span of video action problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in video action understanding. This tutorial clarifies a taxonomy of video action problems, highlights datasets and metrics used to baseline each problem, describes common data preparation methods, and presents the building blocks of state-of-the-art deep learning model architectures.
 翻译:许多人认为,通过深思熟虑了解图像问题而取得的成功可以在视频理解领域复制,然而,视频行动问题和一系列拟议的深思熟虑解决方案的范围可以说比其2D形象兄弟姐妹的范围更广,而且更加多样化。寻找、识别和预测行动是视频行动理解中最突出的任务之一。这一辅导澄清了视频行动问题的分类,突出了用于为每个问题基线的数据集和衡量标准,描述了共同的数据编制方法,并介绍了最先进的深思熟虑模型结构的组成部分。