Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experiences from all previous tasks make lifelong learning using selective experience replay computationally very expensive and impractical as the number of tasks increase. To that end, we propose a reward distribution-preserving coreset compression technique for compressing experience replay buffers stored for selective experience replay. We evaluated the coreset compression technique on the brain tumor segmentation (BRATS) dataset for the task of ventricle localization and on the whole-body MRI for localization of left knee cap, left kidney, right trochanter, left lung, and spleen. The coreset lifelong learning models trained on a sequence of 10 different brain MR imaging environments demonstrated excellent performance localizing the ventricle with a mean pixel error distance of 12.93 for the compression ratio of 10x. In comparison, the conventional lifelong learning model localized the ventricle with a mean pixel distance of 10.87. Similarly, the coreset lifelong learning models trained on whole-body MRI demonstrated no significant difference (p=0.28) between the 10x compressed coreset lifelong learning models and conventional lifelong learning models for all the landmarks. The mean pixel distance for the 10x compressed models across all the landmarks was 25.30, compared to 19.24 for the conventional lifelong learning models. Our results demonstrate that the potential of the coreset-based ERB compression method for compressing experiences without a significant drop in performance.
翻译:选择性经验重演是一种流行的深度强化学习与终身学习结合的策略。 选择性经验重演旨在重新调查之前选定的经验,以避免灾难性遗忘。 此外,基于选择性经验重演的技术是模型无关的,并允许共享在不同模型之间的经验。 然而,存储所有先前任务的经验使得使用选择性经验重演进行终身学习在计算上非常昂贵和不切实际,特别是任务数增加时。 为此,我们提出了一种奖励分布保持的Coreset压缩技术,用于压缩存储用于选择性经验重演的经验回放缓冲区。 我们在脑肿瘤分割(BRATS)数据集上评估了Coreset压缩技术,用于定位心室这一任务,以及在全身核磁共振成像上用于左膝盖、左肾、右粗隆、左肺和脾定位。在10个不同的脑MR成像环境序列上训练的Coreset终身学习模型表现出出色的性能,定位心室的平均像素误差距离为12.93,压缩比为10倍。相比之下,常规终身学习模型定位心室的平均像素距离为10.87。同样,训练于全身核磁共振成像上的Coreset终身学习模型,在所有地标上的10倍压缩的Coreset终身学习模型和常规终身学习模型之间没有显着差异(p = 0.28)。这些10倍压缩模型在所有地标上的平均像素距离为25.30,而常规终身学习模型则为19.24。我们的结果表明,基于Coreset的经验重演缓存压缩方法具有压缩经验而不会显着降低性能的潜力。