Human-robot object handover is a key skill for the future of human-robot collaboration. CORSMAL 2020 Challenge focuses on the perception part of this problem: the robot needs to estimate the filling mass of a container held by a human. Although there are powerful methods in image processing and audio processing individually, answering such a problem requires processing data from multiple sensors together. The appearance of the container, the sound of the filling, and the depth data provide essential information. We propose a multi-modal method to predict three key indicators of the filling mass: filling type, filling level, and container capacity. These indicators are then combined to estimate the filling mass of a container. Our method obtained Top-1 overall performance among all submissions to CORSMAL 2020 Challenge on both public and private subsets while showing no evidence of overfitting. Our source code is publicly available: https://github.com/v-iashin/CORSMAL
翻译:人类机器人物体的交接是人类机器人未来合作的关键技能之一。 CORMSAL 2020 挑战侧重于这一问题的认知部分:机器人需要估计由人类保管的容器的装载质量。虽然在图像处理和音频处理方面有强有力的个别方法,但要解决这个问题,需要从多个传感器一起处理数据。容器的外观、填充声和深度数据提供了基本信息。我们提出了一个多模式方法来预测三个填充质量的关键指标:填充类型、填充水平和容器容量。然后将这些指标合并来估计容器的填充质量。我们的方法在向CORMAL 2020 挑战提交的所有呈件中获得了顶部-1的总体性能,同时没有显示过大的证据。我们的源代码可以公开查阅:https://github.com/v-iashin/CORSMAL。