In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effectiveness of a machine learning method. Our case study aims precisely to analyze the Linemod dataset, considered the state of the art in 6D pose estimation context. That dataset presents images accompanied by ArUco markers; it is evident that such markers will not be available in real-world contexts. We analyze how the presence of the markers affects the pose estimation accuracy, and how this bias may be mitigated through data augmentation and other methods. Our work aims to show how the presence of these markers goes to modify, in the testing phase, the effectiveness of the deep learning method used. In particular, we will demonstrate, through the tool of saliency maps, how the focus of the neural network is captured in part by these ArUco markers. Finally, a new dataset, obtained by applying geometric tools to Linemod, will be proposed in order to demonstrate our hypothesis and uncovering the bias. Our results demonstrate the potential for bias in 6DOF pose estimation networks, and suggest methods for reducing this bias when training with markers.
翻译:近年来,越来越多的工业应用使用了数据驱动的方法。这些方法通常会处理视频图像或部件,因此这些图像的完整性至关重要。有时,由于各种原因,数据集(例如图像)可能会复杂。这时理解图像处理对机器学习方法的有效性产生的影响变得至关重要。我们的案例研究旨在分析Linemod数据集,该数据集在6D姿态估计上被认为是最先进的。该数据集附带有ArUco标记的图像;显然,这些标记不会在现实环境中出现。我们分析标记的存在如何影响姿态估计的准确度,并且如何通过数据增强和其他方法来减轻这种偏差。我们的工作旨在展示这些标记如何在测试时或多或少的修改了深度学习方法的有效性。特别是,我们将通过显著性图工具展示神经网络重点关注了这些ArUco标记的部分。最后,我们提出了一种新的数据集,通过对Linemod应用几何工具来证明我们的假设和揭示偏差。我们的结果表明6自由度姿态估计网络存在潜在偏见,并提出了训练时减少这种偏见的方法。