Video understanding is an important problem in computer vision. Currently, the well-studied task in this research is human action recognition, where the clips are manually trimmed from the long videos, and a single class of human action is assumed for each clip. However, we may face more complicated scenarios in the industrial applications. For example, in the real-world urban pipe system, anomaly defects are fine-grained, multi-labeled, domain-relevant. To recognize them correctly, we need to understand the detailed video content. For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis. In particular, we introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems. Based on these new datasets, we will host two competitions including (1) Video Defect Classification on QV-Pipe and (2) Temporal Defect Localization on CCTV-Pipe. In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary. We expect that, this competition would bring new opportunities and challenges for video understanding in smart city and beyond. The details of our VideoPipe challenge can be found in https://videopipe.github.io.
翻译:视频理解是计算机视野中的一个重要问题。 目前,这项研究中研究周密的任务是人类行动认知,其中剪辑的剪辑是从长视频手工剪辑而成,每个剪辑的片段都假定有一类人类行动。 然而,我们在工业应用中可能面临更复杂的情景。 例如,在现实世界的城市管道系统中,异常缺陷是细微的、多标签的、与域有关的。为了正确认识这些缺陷,我们需要正确理解详细的视频内容。 为此,我们提议推进视频认知的研究领域,从传统行动识别转向工业异常分析。特别是,我们引入两个高质量的视频基准,即QV-Pipe和闭路电视-Pipe,以便在现实世界的城市管道系统中进行异常检查。根据这些新的数据集,我们将主办两次竞赛,包括:(1) QV-Pipe的视频偏差分类和(2) 闭路电视-Pipe的Temporal Defation本地化。我们的报告将描述这些基准的细节、问题定义、竞争轨道、智能衡量标准、以及结果。我们所期望的视频理解将带来新的城市挑战。