Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to maintain a large-scale model trained on growing annotation sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a neural network effectively learns relevant patterns for new (unseen) classes without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages the extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
翻译:由于数据可得性、存储配额、隐私条例和昂贵的批注程序,现代机器学习管道有限,因为数据可用性、存储配额、隐私条例和昂贵的批注程序。这些制约因素使得很难或不可能维持一个在增加批注设备方面受过培训的大型模型。 持续学习直接解决这一问题,最终目标是设计一些方法,让神经网络在不显著改变以前学到的(未见)新课程的成绩的情况下有效地学习相关模式。 在本文中,我们讨论了持续学习视频数据的问题。 我们引入了PIVOT, 这是一种新颖的方法,它利用了从图像领域学前模型中的广泛知识,从而减少了可受训参数的数量和相关的遗忘。 与以往的方法不同,我们是第一个有效使用快速机制进行连续学习而无需任何日常培训前培训的方法。 我们的实验显示, PIVOT在20塔克活动网设置上大大改进了27 %的最新方法。