Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure. The existing deep learning based HC methods usually predict an instance starting from the root node until a leaf node is reached. However, in the real world, images interfered by noise, occlusion, blur, or low resolution may not provide sufficient information for the classification at subordinate levels. To address this issue, we propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner. SGLCHPN comprises two modules: a visual transformer that extracts feature vectors from the input images, and a semantic guided cross-attention module that uses categories word embeddings as queries to guide learning category-specific representations. In order to evaluate the proposed method, we construct two new datasets in which images are at a broad range of quality and thus are labeled to different levels (depths) in the hierarchy according to their individual quality. Experimental results demonstrate the effectiveness of our proposed HC method.
翻译:等级分类 (HC) 指定每个对象, 并有多个标签, 分为等级结构。 现有的深层次学习的 HC 方法通常预测一个实例, 从根节点开始, 到叶节到达为止。 但是, 在现实世界中, 受噪音、 隐蔽、 模糊或低分辨率干扰的图像可能无法为下级分类提供足够的信息。 为了解决这个问题, 我们提议建立一个新型的语义引导级别级混合预测网络( SGLCHPN), 能够以端到端的方式共同进行水平和类别预测。 SGLCHPN 由两个模块组成: 一个视觉变异器, 从输入图像中提取特性矢量, 和一个以语言嵌入为查询的语类跨注意模块, 用于指导特定类别表达。 为了评估拟议方法, 我们建了两个新的数据集, 其图像质量范围很广, 因而根据个人质量标记为等级的不同级别( 深度) 。 实验结果显示了我们提议的 HC 方法的有效性 。</s>