Data augmentation technique from computer vision has been widely considered as a regularization method to improve data efficiency and generalization performance in vision-based reinforcement learning. We variate the timing of using augmentation, which is, in turn, critical depending on tasks to be solved in training and testing. According to our experiments on Open AI Procgen Benchmark, if the regularization imposed by augmentation is helpful only in testing, it is better to procrastinate the augmentation after training than to use it during training in terms of sample and computation complexity. We note that some of such augmentations can disturb the training process. Conversely, an augmentation providing regularization useful in training needs to be used during the whole training period to fully utilize its benefit in terms of not only generalization but also data efficiency. These phenomena suggest a useful timing control of data augmentation in reinforcement learning.
翻译:计算机愿景的数据增强技术被广泛视为提高基于愿景的强化学习的数据效率和一般化绩效的一种正规化方法,我们采用强化的时机有所变换,这反过来又取决于在培训和测试中要完成的任务。根据我们在开放 AI Procgen 基准上的实验,如果增加带来的正规化仅有助于测试,则在培训后延缓增加比在培训期间在抽样和计算复杂性方面使用它要好。我们注意到,有些这类增强可能干扰培训进程。相反,在整个培训期间,需要利用在培训中提供正规化的强化,在培训中有用,以充分利用其好处,不仅在一般化方面,而且在数据效率方面。这些现象表明,在强化学习中,对数据增强的时间安排进行有益的控制。