Capturing and processing video is increasingly common as cameras become cheaper to deploy. At the same time, rich video understanding methods have progressed greatly in the last decade. As a result, many organizations now have massive repositories of video data, with applications in mapping, navigation, autonomous driving, and other areas. Because state-of-the-art object detection methods are slow and expensive, our ability to process even simple ad-hoc object search queries ('find 100 traffic lights in dashcam video') over this accumulated data lags far behind our ability to collect it. Processing video at reduced sampling rates is a reasonable default strategy for these types of queries, however, the ideal sampling rate is both data and query dependent. We introduce ExSample, a low cost framework for object search over unindexed video that quickly processes search queries by adapting the amount and location of sampled frames to the particular data and query being processed. ExSample prioritizes the processing of frames in a video repository so that processing is focused in portions of video that most likely contain objects of interest. It continually re-prioritizes processing based on feedback from previously processed frames. On large, real-world datasets, ExSample reduces processing time by up to 6x over an efficient random sampling baseline and by several orders of magnitude over state-of-the-art methods that train specialized per-query surrogate models. ExSample is thus a key component in building cost-efficient video data management systems.
翻译:随着照相机更廉价地部署,获取和处理视频越来越普遍。与此同时,丰富的视频理解方法在过去十年中取得了很大进展。因此,许多组织现在拥有大量的视频数据储存库,在绘图、导航、自主驾驶和其他领域应用。由于最先进的天体探测方法既慢又昂贵,因此我们甚至能够处理简单的临时物件搜索查询(在破摄像头视频中找到100个交通灯),而这一累积的数据远远落后于我们收集数据的能力。以较低的采样率处理视频是一个合理的默认策略,但是,理想的采样率既取决于数据,也取决于查询。因此,我们采用了ExSample,一个用于在无索引的视频上进行对象搜索的低成本框架,通过将抽样框架的数量和位置与正在处理的特定数据和查询相适应,快速进行搜索。ExSample把框架的处理放在视频储存库的优先位置上,以便处理工作集中在最有可能包含对象的视频中。它不断根据以前处理过的视频框架部分的反馈,将理想的采样率取决于数据和查询。我们引入了无索引的图案底级的图像管理方法,从而将实时地压低了实时地压压在实际数据处理中。