Domain Randomization (DR) is known to require a significant amount of training data for good performance. We argue that this is due to DR's strategy of random data generation using a uniform distribution over simulation parameters, as a result, DR often generates samples which are uninformative for the learner. In this work, we theoretically analyze DR using ideas from multi-source domain adaptation. Based on our findings, we propose Adversarial Domain Randomization (ADR) as an efficient variant of DR which generates adversarial samples with respect to the learner during training. We implement ADR as a policy whose action space is the quantized simulation parameter space. At each iteration, the policy's action generates labeled data and the reward is set as negative of learner's loss on this data. As a result, we observe ADR frequently generates novel samples for the learner like truncated and occluded objects for object detection and confusing classes for image classification. We perform evaluations on datasets like CLEVR, Syn2Real, and VIRAT for various tasks where we demonstrate that ADR outperforms DR by generating fewer data samples.
翻译:已知域随机化(DR) 需要大量培训数据才能取得良好业绩。 我们辩称,这是因为DR采用对模拟参数统一分布的随机数据生成策略,DR往往生成对学习者来说不具有信息意义的样本。 在这项工作中,我们利用多源域适应的理念从理论上分析DR。根据我们的调查结果,我们提议将Aversarial多域随机化(ADR)作为DR的一种有效变体,产生与学习者在培训期间的对立样本。我们把ADR作为一种政策,其动作空间是量化模拟参数空间。在每次迭代中,政策动作生成了标签数据,奖励被设定为学习者在数据上损失的负值。结果,我们观察ADR经常为学习者生成新样本,例如用于天体探测和图像分类的隐蔽物体。我们对CLEVR、Syn2Real和VIRAT等数据集进行了评估,在其中,我们演示ADR的样本比DR少生成数据。