Recently, there has been tremendous interest in industry 4.0 infrastructure to address labor shortages in global supply chains. Deploying artificial intelligence-enabled robotic bin picking systems in real world has become particularly important for reducing stress and physical demands of workers while increasing speed and efficiency of warehouses. To this end, artificial intelligence-enabled robotic bin picking systems may be used to automate order picking, but with the risk of causing expensive damage during an abnormal event such as sensor failure. As such, reliability becomes a critical factor for translating artificial intelligence research to real world applications and products. In this paper, we propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet) for tackling object detection and segmentation for robotic bin picking using data from different modalities. This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment. In particular, we realize the multimodal redundancy framework with a gate fusion module and dynamic ensemble learning. Finally, we present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty. Through experiments, we demonstrate that in an event of missing modality, our system provides a much more reliable performance compared to baseline models. We also demonstrate that our MC score is a more reliability indicator for outputs during inference time compared to the model generated confidence scores that are often over-confident.
翻译:近来,工业 4.0 基础设施在解决全球供应链劳动力短缺方面引起了极大的兴趣。部署用于物品拣选的人工智能机器人抓取系统在减轻工人的压力和体力需求,同时提高仓库的速度和效率方面尤其重要。但是,这种人工智能机器人的抓取系统也会存在传感器失效等异常事件所带来的昂贵损失风险。因此,在将人工智能研究应用于实际应用和产品开发时,可靠性变得至关重要。本文提出了基于多模式冗余(MMRNet)的可靠物体检测和分割系统,通过来自不同模态的数据来解决物品拣选的物体检测和分割问题。这是第一个引入多模式冗余概念来解决部署过程中传感器失效问题的系统。特别地,我们采用门控融合模块和动态集成学习实现多模式冗余框架。最后,我们提出了一种新的无标签多模式一致性(MC)分数,利用所有模态的输出来测量整个系统输出的可靠性和不确定性。通过实验,我们证明了在缺少模态的情况下,我们的系统相比基线模型提供了更可靠的性能。我们还证明了我们的 MC 分数比通常过于自信的模型生成的置信度分数更可靠地表示推理时间的输出可靠性。