DAgger is an imitation algorithm that aggregates its original datasets by querying the expert on all samples encountered during training. In order to reduce the number of samples queried, we propose a modification to DAgger, known as DADAgger, which only queries the expert for state-action pairs that are out of distribution (OOD). OOD states are identified by measuring the variance of the action predictions of an ensemble of models on each state, which we simulate using dropout. Testing on the Car Racing and Half Cheetah environments achieves comparable performance to DAgger but with reduced expert queries, and better performance than a random sampling baseline. We also show that our algorithm may be used to build efficient, well-balanced training datasets by running with no initial data and only querying the expert to resolve uncertainty.
翻译:为了减少所询问的样本数量,我们建议对Dagger进行修改,即DADagger,它只询问无法分布的州-州对行动专家(OOOD)。 OOD 状态是通过测量每个州一组模型行动预测的差异来识别的,我们用辍学来模拟这些模型。在Car Racing 和 Lix Cheetah 环境中进行的测试取得了与Dagger相似的性能,但专家查询减少,而且比随机抽样基线的性能更好。我们还表明,我们的算法可以用来建立高效、平衡的培训数据集,没有初始数据,只询问专家解决不确定性。