Selective experience replay is a popular strategy for integrating lifelong learning with deep reinforcement learning. Selective experience replay aims to recount selected experiences from previous tasks to avoid catastrophic forgetting. Furthermore, selective experience replay based techniques are model agnostic and allow experiences to be shared across different models. However, storing experiences from all previous tasks make lifelong learning using selective experience replay computationally very expensive and impractical as the number of tasks increase. To that end, we propose a reward distribution-preserving coreset compression technique for compressing experience replay buffers stored for selective experience replay. We evaluated the coreset compression technique on the brain tumor segmentation (BRATS) dataset for the task of ventricle localization and on the whole-body MRI for localization of left knee cap, left kidney, right trochanter, left lung, and spleen. The coreset lifelong learning models trained on a sequence of 10 different brain MR imaging environments demonstrated excellent performance localizing the ventricle with a mean pixel error distance of 12.93 for the compression ratio of 10x. In comparison, the conventional lifelong learning model localized the ventricle with a mean pixel distance of 10.87. Similarly, the coreset lifelong learning models trained on whole-body MRI demonstrated no significant difference (p=0.28) between the 10x compressed coreset lifelong learning models and conventional lifelong learning models for all the landmarks. The mean pixel distance for the 10x compressed models across all the landmarks was 25.30, compared to 19.24 for the conventional lifelong learning models. Our results demonstrate that the potential of the coreset-based ERB compression method for compressing experiences without a significant drop in performance.
翻译:选择性经验回放是一种将深度强化学习与生涯学习结合起来的流行策略。选择性经验回放旨在重述选定的先前任务经验,以避免灾难性遗忘。此外,基于选择性经验回放的技术是模型无关的,并允许在不同模型之间分享经验。然而,存储所有先前任务经验将使得使用选择性经验回放的生涯学习在计算上变得非常昂贵和不切实际,尤其是任务数量增加时。为此,我们提出了一个保持奖励分布的coreset压缩技术,用于压缩存储在选择性经验回放中的经验回放缓冲区。我们评估了coreset压缩技术在大脑肿瘤分割(BRATS)数据集上,对于脑室定位和整体MRI的定位左膝盖,左肾,右股骨粗隆,左肺和脾的任务。在由10个不同的大脑MRI环境组成的序列中,训练了coreset生涯学习模型,表现出了优秀的脑室定位性能,脱位像素距离的平均值为12.93,压缩比为10倍。相比之下,传统的生涯学习模型定位脑室的平均像素距离为10.87个像素。同样,基于coreset的生涯学习模型在整体MRI上的表现,对于所有地标,10倍压缩后的coreset生涯学习模型与传统的生涯学习模型之间没有显著差异(p=0.28)。在所有地标上,10倍压缩模型的平均像素距离为25.3,而传统的生涯学习模型的平均像素距离为19.24。我们的结果表明,coreset基础ERV压缩方法具有压缩经验的潜力,而性能不会有显著的下降。