使用地形学上持久性特征在室内环境中的视觉物体识别 (Visual Object Recognition in Indoor Environments Using Topologically Persistent Features)

Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots. In this letter, we propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge. In particular, we extract two kinds of features, namely, sparse persistence image (PI) and amplitude, by applying persistent homology to multi-directional height function-based filtrations of the cubical complexes representing the object segmentation maps. The features are then used to train a fully connected network for recognition. For performance evaluation, in addition to a widely used shape dataset and a benchmark indoor scenes dataset, we collect a new dataset, comprising scene images from two different environments, namely, a living room and a mock warehouse. The scenes are captured using varying camera poses under different illumination conditions and include up to five different objects from a given set of fourteen objects. On the benchmark indoor scenes dataset, sparse PI features show better recognition performance in unseen environments than the features learned using the widely used ResNetV2-56 and EfficientNet-B4 models. Further, they provide slightly higher recall and accuracy values than Faster R-CNN, an end-to-end object detection method, and its state-of-the-art variant, Domain Adaptive Faster R-CNN. The performance of our methods also remains relatively unchanged from the training environment (living room) to the unseen environment (mock warehouse) in the new dataset. In contrast, the performance of the object detection methods drops substantially. We also implement the proposed method on a real-world robot to demonstrate its usefulness.

翻译：隐蔽室内环境中的物体识别仍然是对移动机器人视觉认知的一个棘手问题。在本信中,我们提议使用依赖物体形状信息的地形持久性特征来应对这一挑战。特别是,我们提取了两种特征,即:低持久性图像(PI)和振幅,对多向高度复合体的多向性功能过滤,对多向性高度多向性能显示物体分割图进行持续的同质分析,然后利用这些特征来训练一个完全连接的识别网络。对于绩效评估,除了广泛使用的形状数据集和基准室内场景数据集之外,我们还收集了一个新的数据集,包括两个不同环境的场景图像,即生活室和模拟仓库。通过不同摄像机捕捉到两种不同的特征,对多个特定物体的多向高度多向性过滤。在基准的室内数据集中,稀少的 PI特征显示的认知性比使用广泛使用的 ResNetV2-56 和高效的Net-B4 目标模型所学的特征要好得多。此外,我们从稍高的回顾和精确度图像图像图像组,以及比快速的 R- Stal- dalforforforal rode rode 环境探测方法。