Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs that are widely applied in open world object detection. Extensive experimental results on multiple knowledge-embeddings as well as distance metrics indicate that knowledge-based class representations result in more semantically grounded misclassifications while performing on par compared to one-hot methods on the challenging COCO and Cityscapes object detection benchmarks. We generalize our findings to multiple object detection architectures by proposing a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
翻译:大部分天体的辨识被视为一个单热问题, 处理各类是互不相干和互不关联的。 每个图像区域必须指定给一组天体中的一个成员, 包括一个背景类, 而不考虑对象类型的任何相似性 。 在这项工作中, 我们比较了从单热方法中学到的类嵌入的错误统计数据, 与在开放世界天体探测中广泛应用的自然语言处理或知识图形中经过精密结构的嵌入。 多知识组合和远程测量的广泛实验结果显示, 知识类的表达方式导致在具有挑战性的COCO和城市天体天体探测基准上, 与单热方法相比, 产生更多的基于语系的错误分类。 我们通过提出基于关键点和变异器的天体探测结构的知识组合设计, 把我们的发现结果推广到多个天体探测结构。