Data-driven methods for physics-based character control using reinforcement learning have been successfully applied to generate high-quality motions. However, existing approaches typically rely on Gaussian distributions to represent the action policy, which can prematurely commit to suboptimal actions when solving high-dimensional continuous control problems for highly-articulated characters. In this paper, to improve the learning performance of physics-based character controllers, we propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We exploit particle filtering to dynamically explore and discretize the action space, and track the posterior policy represented as a mixture distribution. The resulting policy can replace the unimodal Gaussian policy which has been the staple for character control problems, without changing the underlying model architecture of the reinforcement learning algorithm used to perform policy optimization. We demonstrate the applicability of our approach on various motion capture imitation tasks. Baselines using our particle-based policies achieve better imitation performance and speed of convergence as compared to corresponding implementations using Gaussians, and are more robust to external perturbations during character control. Related code is available at: https://motion-lab.github.io/PFPN.
翻译:利用强化学习成功地应用了基于物理的字符控制的数据驱动方法,以产生高质量的动作。然而,现有的方法通常依靠高山分布法来代表行动政策,在解决高度分辨字符的高维连续控制问题时,可能过早地承诺采取次优化的行动。在本文件中,为了改进基于物理的字符控制器的学习性能,我们提议了一个框架,考虑以粒子行动政策替代高山政策。我们利用粒子过滤法动态地探索并分解行动空间,并追踪作为混合分布法代表的后方政策。由此产生的政策可以取代作为字符控制问题主因的单式高斯政策,而不会改变用于优化政策的强化学习算法的基本模型结构。我们展示了我们对各种运动捕捉模仿任务所采用的方法的适用性。我们以粒子为基础的政策与使用高山执行法的相应执行方法相比,取得了更好的模仿性能和趋同速度,并且比字符控制期间的外扰动性更强。相关的代码可以查到: https://motion/labgiobth。