Object pose estimation is crucial for robotic applications and augmented reality. Beyond instance level 6D object pose estimation methods, estimating category-level pose and shape has become a promising trend. As such, a new research field needs to be supported by well-designed datasets. To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects. We developed a novel robot-supported multi-modal (RGB, depth, polarisation) data acquisition and annotation process. It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation. To set a benchmark for our dataset, state-of-the-art RGB-D and monocular RGB methods are evaluated on the challenging scenes of PhoCaL.
翻译:除了6D级物体构成估计方法外,估计类别层次的形状和形状也已成为一个有希望的趋势。因此,需要设计完善的数据集来支持一个新的研究领域。为了向社区提供高质量的地面事实说明,我们为类别级物体引入了一个多式数据集,用光度具挑战性的物体(PhoCaL)进行估计。PhoCaL由60个高质量的3D型模型组成,包括高度反射、透明和对称的8个类别的家用物体。我们开发了一个新型机器人支持的多式(RGB、深度、极化)数据采集和注解过程。它确保不透明的纹质、亮亮度和透明的物体的外表的亚毫米精度,不运动模糊和完美的相机同步。为了为我们的数据集设定一个基准,在极具挑战性的 PhoCaL 场上对最先进的RGB-D 和单色 RGB 方法进行评估。