This paper presents an efficient approach to object manipulation planning using Monte Carlo Tree Search (MCTS) to find contact sequences and an efficient ADMM-based trajectory optimization algorithm to evaluate the dynamic feasibility of candidate contact sequences. To accelerate MCTS, we propose a methodology to learn a goal-conditioned policy-value network used to direct the search towards promising nodes. Further, manipulation-specific heuristics enable to drastically reduce the search space. Systematic object manipulation experiments in a physics simulator demonstrate the efficiency of our approach. In particular, our approach scales favorably for long manipulation sequences thanks to the learned policy-value network, significantly improving planning success rate.
翻译:本文件介绍了利用蒙特卡洛树搜索(MCTS)寻找接触序列和基于ADMM的高效ADMM轨迹优化算法来评估候选人接触序列的动态可行性的有效方法。为了加速MCTS,我们提议了一种方法来学习一个有目标条件的政策价值网络,用于引导搜索走向有希望的节点。此外,针对操作的特定超常性能可以大大减少搜索空间。物理学模拟器中的系统物体操纵实验显示了我们的方法的效率。特别是,我们的方法由于有学识的政策价值网络而有利于长期操作序列,极大地提高了规划成功率。