This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human's role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors. The comparison with the state-of-the-art techniques demonstrates that the proposed approach yields ceiling-level performance with reduced computational cost in summarising the videos.
翻译:本文提出一个高效的视频总结框架,使整个视频在几个关键框架或视频片段中有一个亮点。现有的视频总结框架基于利用计算机视觉低水平地物提取或高水平域提取的算法。然而,作为摘要视频的最终使用者,人类仍然是最被忽视的方面。因此,拟议文件考虑了人类在总结中的作用,并引入了以视觉关注为基础的合成技术。为了理解人类关注行为,我们设计并进行了与人类参与者的实验,使用了电子脑图和眼睛跟踪技术。从实验中获得的EEEG和眼睛跟踪数据同时处理,并用于包含大量视频内容的有用信息的段段框。因此,框架分割主要依赖人类的认知判断。使用我们的方法,一个视频摘要为96.5%,同时保持更高的精确度和高记得因素。与最新技术的比较表明,拟议方法在计算成本的总结中产生最高水平的性能。