This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos.
翻译:本文建议为全长长视频提供新型的光量缩略图集装箱缩略图缩略图框架(LTC-SUM),这一框架通过使用终端用户装置的计算资源,为同时使用的用户制作了个性化键盘摘要。获取和处理全部视频数据以生成视频摘要的最先进方法在计算上非常密集。在这方面,拟议的LTC-SUM方法使用轻量级缩略图处理复杂的探测事件的过程。这大大降低了计算复杂性,并通过解决资源限制的终端用户装置的计算和隐私瓶颈,提高了通信和储存效率。这些改进是通过设计一个轻度2DCNN模型,从缩略图中提取特征,该模型只帮助选择和检索几个特定的部分。一套全18个长视频(持续时间约为32.9小时)的广泛量化实验表明,拟议的方法在计算上比同一终端用户装置配置的最先进方法效率很高。对56名与会者的联合定性评估显示,在使用拟议的关键缩略图格式设计中,首次对所制作的个人缩略图进行了更高评级。