Policy gradient is an effective way to estimate continuous action on the environment. This paper, it about explaining the mathematical formula and code implementation. In the end, comparing between the rotation angle of the stick on CartPole , and the angle of the Autonomous vehicle when turning, and utilizing the Bicycle Model, a simple Kinematic dynamic model, are the purpose to discover the similarity between these two models, so as to facilitate the model transfer from CartPole to the F1tenth Autonomous vehicle.
翻译:政策梯度是估计环境持续行动的有效方法。 本文是关于解释数学公式和代码执行的。 最后, 比较CartPole上的杆子的旋转角度和自动车在翻转时的旋转角度, 并使用自行车模式(一个简单的虚拟动力模型), 目的是发现这两种模式之间的相似性, 以便于从CartPole向F1tenth自动车的模型转移。