This paper analyzes the simulation to reality gap in reinforcement learning (RL) cyber-physical systems with fractional delays (i.e. delays that are non-integer multiple of the sampling period). The consideration of fractional delay has important implications on the nature of the cyber-physical system considered. Systems with delays are non-Markovian, and the system state vector needs to be extended to make the system Markovian. We show that this is not possible when the delay is in the output, and the problem would always be non-Markovian. Based on this analysis, a sampling scheme is proposed that results in efficient RL training and agents that perform well in realistic multirotor unmanned aerial vehicle simulations. We demonstrate that the resultant agents do not produce excessive oscillations, which is not the case with RL agents that do not consider time delay in the model.
翻译:本文分析了模拟强化学习(RL)网络物理系统中现实差距的模拟,有零星延迟(即延迟是取样期的多个非整数),对分数延迟的考虑对所考虑的网络物理系统的性质有重要影响。有延迟的系统是非马尔科维安系统,系统状态矢量需要扩大,以建立马科维安系统。我们表明,当延迟产出时,不可能做到这一点,问题总是非马尔科维安。根据这项分析,建议采用一个抽样计划,使有效的RL培训和在现实多色无人驾驶飞行器模拟中表现良好的代理物产生高效培训和代理物。我们证明,所产生的代理物不会产生过度振动,对于不考虑模型时间延迟的RL代理物来说,情况并非如此。