Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.
翻译:以自我为中心的视频中不受监督的行动部分分割是活动识别和基于内容的视频检索等任务中可取的特征。 将搜索空间缩小到有限的一组行动部分有助于更快和不那么吵闹的匹配。 但是,机器对连续人类活动期间自然时间缩短的理解存在巨大差距。 这份工作报告了在使用以自我为中心的相机拍摄的视频中对以自我为中心的视频中分离行动部分采取的新颖的凝视方法。 Gaze 被用来将区域定位在一个框架中。 通过在连续几个利益区域中跟踪两个简单的基于运动的参数,我们发现了一套有限的时间缩短。 我们在数据集(即BRISIGAZE-Actions)上使用(两个参数的)组合展示了几项结果。 该数据集包含描述一些日常生活活动的以自我为中心的视频。 通过执行两种安特罗比措施,时间削减的质量得到进一步提高。