使用最小模拟运行程序卸载机器人的工业卡车 (Learning Optimal Decision Making for an Industrial Truck Unloading Robot using Minimal Simulator Runs)

Consider a truck filled with boxes of varying size and unknown mass and an industrial robot with end-effectors that can unload multiple boxes from any reachable location. In this work, we investigate how would the robot with the help of a simulator, learn to maximize the number of boxes unloaded by each action. Most high-fidelity robotic simulators like ours are time-consuming. Therefore, we investigate the above learning problem with a focus on minimizing the number of simulation runs required. The optimal decision-making problem under this setting can be formulated as a multi-class classification problem. However, to obtain the outcome of any action requires us to run the time-consuming simulator, thereby restricting the amount of training data that can be collected. Thus, we need a data-efficient approach to learn the classifier and generalize it with a minimal amount of data. A high-fidelity physics-based simulator is common in general for complex manipulation tasks involving multi-body interactions. To this end, we train an optimal decision tree as the classifier, and for each branch of the decision tree, we reason about the confidence in the decision using a Probably Approximately Correct (PAC) framework to determine whether more simulator data will help reach a certain confidence level. This provides us with a mechanism to evaluate when simulation can be avoided for certain decisions, and when simulation will improve the decision making. For the truck unloading problem, our experiments show that a significant reduction in simulator runs can be achieved using the proposed method as compared to naively running the simulator to collect data to train equally performing decision trees.

翻译：考虑一个装满不同大小和未知质量的卡车和一个工业机器人的工业机器人,这些卡车装满了不同大小和未知质量的箱子,可以从任何可到达的地点卸下多个盒子。在这项工作中,我们调查机器人如何在模拟器的帮助下操作耗时模拟器,从而限制培训数据的数量。因此,我们需要一种数据效率的方法来学习分类器,并用少量的数据来概括它。因此,我们调查上面的学习问题,重点是尽量减少模拟运行的次数。在这个环境中,最佳的决策问题可以作为一个多级分类问题来制定。然而,要取得任何行动的结果,都需要我们运行耗时的模拟器,从而限制可收集的培训数据的数量。因此,我们需要一种数据效率的方法来学习分类器,并用最少量的数据来概括它。一个高纤维物理模拟模拟器是常见的。为了达到这个目的,我们训练一个最佳的决策树,作为分类师,并且对于任何决策树的每一个分支,我们都需要运行一个耗时花费时间的模拟器, 从而在进行一个稳定的模拟时, 将一个稳定的计算器来显示我们的信心, 将用一个稳定的计算一个稳定的计算方法来做一个相当的模拟。当一个可能的模拟时, 当做一个稳定的计算, 当做一个可以显示一个稳定的计算一个稳定的计算, 当做一个稳定的计算时, 当我们做一个稳定的计算一个稳定的计算一个正确的计算, 当做一个稳定的计算一个做一个稳定的计算, 做一个稳定的计算一个计算, 当一个计算一个计算, 当一个计算一个计算一个计算过程来做一个计算一个计算一个计算过程的时候, 当一个可以做一个可以做一个可以做一个计算一个计算过程的时候, 当一个计算一个计算一个计算一个计算一个计算过程的时候, 一种计算一个计算一个计算一个计算一个计算一个计算一个计算一个计算一个计算一个计算一个计算过程的时候, 当一个计算过程的时候, 将一个计算过程的时候, 当一个计算过程时, 一种计算过程, 一种计算过程的时候, 一种计算过程, 一种计算过程, 一种计算过程, 一种计算一个计算一个计算过程, 一种计算一个计算过程, 一种计算过程, 一种计算一个计算一个计算一个计算一个计算过程, 一种计算一个计算一个计算一个计算过程, 将会做一个计算一个计算一个计算一个计算一个计算过程, 一种计算过程来做一个计算一个计算