This study uses domain randomization to generate a synthetic RGB-D dataset for training multimodal instance segmentation models, aiming to achieve colour-agnostic hand localization in cluttered industrial environments. Domain randomization is a simple technique for addressing the "reality gap" by randomly rendering unrealistic features in a simulation scene to force the neural network to learn essential domain features. We provide a new synthetic dataset for various hand detection applications in industrial environments, as well as ready-to-use pretrained instance segmentation models. To achieve robust results in a complex unstructured environment, we use multimodal input that includes both colour and depth information, which we hypothesize helps to improve the accuracy of the model prediction. In order to test this assumption, we analyze the influence of each modality and their synergy. The evaluated models were trained solely on our synthetic dataset; yet we show that our approach enables the models to outperform corresponding models trained on existing state-of-the-art datasets in terms of Average Precision and Probability-based Detection Quality.
翻译:本研究利用领域随机化生成合成RGB-D数据集,用于训练多模态实例分割模型,旨在实现在混乱的工业环境下对颜色不敏感的手部定位。领域随机化是一种解决“现实鸿沟”的简单技术,通过在模拟场景中随机渲染不真实的特征,迫使神经网络学习基本的领域特征。我们提供了一个新的合成数据集,用于各种工业环境下的手部检测应用,并提供了预训练好的实例分割模型。为了在复杂的非结构化环境中获得稳健的结果,我们使用多模态输入,包括颜色和深度信息,我们假设这有助于提高模型预测的准确性。为了测试假设,我们分析了每种模式及其协同作用的影响。评估的模型仅在我们的合成数据集上进行了训练;然而我们展示了我们的方法使得这些模型在平均精度和基于概率的检测质量方面优于现有的最先进数据集上训练的模型。