Transparent objects present multiple distinct challenges to visual perception systems. First, their lack of distinguishing visual features makes transparent objects harder to detect and localize than opaque objects. Even humans find certain transparent surfaces with little specular reflection or refraction, e.g. glass doors, difficult to perceive. A second challenge is that common depth sensors typically used for opaque object perception cannot obtain accurate depth measurements on transparent objects due to their unique reflective properties. Stemming from these challenges, we observe that transparent object instances within the same category (e.g. cups) look more similar to each other than to ordinary opaque objects of that same category. Given this observation, the present paper sets out to explore the possibility of category-level transparent object pose estimation rather than instance-level pose estimation. We propose TransNet, a two-stage pipeline that learns to estimate category-level transparent object pose using localized depth completion and surface normal estimation. TransNet is evaluated in terms of pose estimation accuracy on a recent, large-scale transparent object dataset and compared to a state-of-the-art category-level pose estimation approach. Results from this comparison demonstrate that TransNet achieves improved pose estimation accuracy on transparent objects and key findings from the included ablation studies suggest future directions for performance improvements.
翻译:第二个挑战是,通常用于不透明的物体感知的常见深度传感器,由于其独特的反射特性和表面正常估计,通常用于不透明的物体感应的常见深度传感器无法对透明物体进行准确的深度测量。从这些挑战中,我们观察到,同一类别中的透明物体(例如杯子)与同一类别中普通不透明物体相比,看起来更加相似。根据这一观察,本文件提出探讨类别透明物体进行估计的可能性,而不是实例一级的估计。我们提议TransNet,这是一个两阶段的管道,学习利用局部深度完成和表面正常估计来估计类别中透明物体的构成。 TransNet是用来估计最近大规模透明物体数据集的估计准确性,并与最新、大规模透明物体数据集进行比较,与同一类别中的普通不透明物体相比。这一比较的结果显示,TRansNet在改进了未来目标的准确性评估,包括了未来关键研究的准确性结果。