The empirical success of Reinforcement Learning (RL) in the setting of contact-rich manipulation leaves much to be understood from a model-based perspective, where the key difficulties are often attributed to (i) the explosion of contact modes, (ii) stiff, non-smooth contact dynamics and the resulting exploding / discontinuous gradients, and (iii) the non-convexity of the planning problem. The stochastic nature of RL addresses (i) and (ii) by effectively sampling and averaging the contact modes. On the other hand, model-based methods have tackled the same challenges by smoothing contact dynamics analytically. Our first contribution is to establish the theoretical equivalence of the two methods for simple systems, and provide qualitative and empirical equivalence on a number of complex examples. In order to further alleviate (ii), our second contribution is a convex, differentiable and quasi-dynamic formulation of contact dynamics, which is amenable to both smoothing schemes, and has proven through experiments to be highly effective for contact-rich planning. Our final contribution resolves (iii), where we show that classical sampling-based motion planning algorithms can be effective in global planning when contact modes are abstracted via smoothing. Applying our method on a collection of challenging contact-rich manipulation tasks, we demonstrate that efficient model-based motion planning can achieve results comparable to RL with dramatically less computation. Video: https://youtu.be/12Ew4xC-VwA
翻译:强化学习(RL)在确定接触丰富的操作方面所取得的实证成功,从基于模型的视角可以理解很多,从这个模型的角度,主要困难往往归因于:(一) 接触模式的爆炸,(二) 僵硬、非脉冲的接触动态,以及由此产生的爆炸/不连续的梯度,(三) 规划问题的非混杂性,RL地址(一)和(二)的随机性性质,通过有效抽样和平均接触模式。另一方面,基于模型的方法通过分析平滑接触动态,解决了同样的挑战。我们的第一个贡献是,为简单系统确定两种方法的理论等同性,为若干复杂例子提供质和实证的等同性。为了进一步缓解(二),我们的第二个贡献是,对接触动态进行交汇、不同和准动态的表述,这既适合平滑计划,又通过实验证明对接触丰富的规划非常有效。我们的最后贡献(三)表明,基于典型抽样的动作规划模式能够有效地进行全球联系模式的平滑动。</s>