Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
翻译:现代机器学习流程由于数据可用性、存储配额、隐私法规和昂贵的注释过程而受到限制。这些约束使得在这些动态标注集上训练和更新大规模模型变得困难或不可能。连续学习直接解决了这个问题,最终的目标是设计方法,使得深度神经网络在新的(未看到的)类别中有效地学习相关模式,而不会显著影响先前学习的性能。在本文中,我们解决了关于视频数据的连续学习问题。我们引入了PIVOT,一种新颖的方法,利用图像域中预训练模型的广泛知识,从而减少可训练参数和相关的遗忘。和以往的方法不同,我们的方法是第一种有效地使用提示机制来进行连续学习的方法,而没有任何领域内的预训练。我们的实验表明,PIVOT在20个任务的ActivityNet设置上将最先进的方法提高了显著的27%。