递增视频提示探测全球原型编码 (Global Prototype Encoding for Incremental Video Highlights Detection)

Video highlights detection has been long researched as a topic in computer vision tasks, digging the user-appealing clips out given unexposed raw video inputs. However, in most case, the mainstream methods in this line of research are built on the closed world assumption, where a fixed number of highlight categories is defined properly in advance and need all training data to be available at the same time, and as a result, leads to poor scalability with respect to both the highlight categories and the size of the dataset. To tackle the problem mentioned above, we propose a video highlights detector that is able to learn incrementally, namely \textbf{G}lobal \textbf{P}rototype \textbf{E}ncoding (GPE), capturing newly defined video highlights in the extended dataset via their corresponding prototypes. Alongside, we present a well annotated and costly dataset termed \emph{ByteFood}, including more than 5.1k gourmet videos belongs to four different domains which are \emph{cooking}, \emph{eating}, \emph{food material}, and \emph{presentation} respectively. To the best of our knowledge, this is the first time the incremental learning settings are introduced to video highlights detection, which in turn relieves the burden of training video inputs and promotes the scalability of conventional neural networks in proportion to both the size of the dataset and the quantity of domains. Moreover, the proposed GPE surpasses current incremental learning methods on \emph{ByteFood}, reporting an improvement of 1.57\% mAP at least. The code and dataset will be made available sooner.

翻译：在计算机视觉任务中,长期研究视频亮点探测,作为计算机视觉任务的一个专题, 挖掘用户- 请求剪辑的剪辑, 给未曝光的原始视频输入。然而, 在多数情况下, 此研究线的主流方法建在封闭世界的假设上, 在封闭世界的假设中, 一个固定数量的亮点类别能够提前正确定义, 并且需要同时提供所有培训数据, 从而导致在突出类别和数据集大小方面, 调频的可缩放性不强。为了解决上述问题, 我们提议了一个视频亮点检测器, 能够不断学习, 即\ textbf{ G} Lobal\ textb{P} P} rototypele kind\ textbf{E}ncode( GPGPEE), 在扩展的数据集中新定义的亮度, 需要同时提供所有培训数据, 称为 emph{Byfood food } 的缩略。包括超过 5.1k gome 调调的调视频属于四个不同的域域, 正在显示的递增,, 、缩缩缩缩缩和变缩的缩化和变缩缩缩化数据数据的缩化的缩化的缩化的缩化, 和缩化的缩化的缩化的缩化和缩化的缩化的缩化的缩化的缩化, 和缩化的缩化的缩化的缩化的缩化的缩化的缩化, 和缩化的缩略图图图。