Current convolution neural network (CNN) classification methods are predominantly focused on flat classification which aims solely to identify a specified object within an image. However, real-world objects often possess a natural hierarchical organization that can significantly help classification tasks. Capturing the presence of relations between objects enables better contextual understanding as well as control over the severity of mistakes. Considering these aspects, this paper proposes an end-to-end hierarchical model for image detection and classification built upon the YOLO model family. A novel hierarchical architecture, a modified loss function, and a performance metric tailored to the hierarchical nature of the model are introduced. The proposed model is trained and evaluated on two different hierarchical categorizations of the same dataset: a systematic categorization that disregards visual similarities between objects and a categorization accounting for common visual characteristics across classes. The results illustrate how the suggested methodology addresses the inherent hierarchical structure present in real-world objects, which conventional flat classification algorithms often overlook.
翻译:当前卷积神经网络(CNN)分类方法主要集中于扁平化分类,其目标仅为识别图像中的特定对象。然而,现实世界中的对象通常具有天然的层次化组织结构,这种结构能够显著辅助分类任务。捕捉对象间关联关系有助于提升上下文理解能力,并控制分类错误的严重程度。基于这些考量,本文提出一种基于YOLO模型系列的端到端层次化图像检测与分类模型。我们引入了创新的层次化架构、改进的损失函数以及适配模型层次化特性的性能评估指标。该模型在相同数据集的两个不同层次化分类体系上进行训练与评估:一种忽略对象间视觉相似性的系统化分类体系,另一种则考虑类别间共同视觉特征的分类体系。实验结果表明,所提方法能够有效处理现实世界对象中固有的层次化结构,而传统扁平分类算法往往忽视这种结构特性。