This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. Dynamics models approximate the true transition function of a given task, in order to perform policy search directly on the model rather than on the costly real system. This study aims to determine how to bootstrap a model as efficiently as possible, by comparing initialization methods employed in two different policy search frameworks in the literature. The study focuses on the model performance under the episode-based framework of Evolutionary methods using probabilistic ensembles. Experimental results show that various task-dependant factors can be detrimental to each method, suggesting to explore hybrid approaches.
翻译:本文研究了初步数据收集方法对随后学习动态模型的影响。动态模型与特定任务的真正过渡功能相近,以便直接对模型而不是对昂贵的实际系统进行政策搜索。本研究报告的目的是通过比较文献中两个不同的政策搜索框架采用的初始化方法,确定如何尽可能高效地捕捉模型。本研究报告侧重于在以事件为基础的进化方法框架下使用概率组合的模型性能。实验结果显示,各种任务依赖因素可能对每种方法都有害,建议探索混合方法。