Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular considering the large number of machine learning applications, not least those based on Deep Neural Networks, that have been trained on these datasets. In this paper we posit the problem to be the lack of a knowledge representation (KR) methodology providing the foundations for the construction of these ground truth benchmark datasets. Accordingly, we propose a solution articulated in three main steps: (i) deconstructing the object recognition process in four ordered stages grounded in the philosophical theory of teleosemantics; (ii) based on such stratification, proposing a novel four-phased methodology for organizing objects in classification hierarchies according to their visual properties; and (iii) performing such classification according to the faceted classification paradigm. The key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae, and not from linguistically grounded properties. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments.
翻译:机器学习和计算机愿景的近期工作证明,在开发主要物体识别基准数据集方面存在系统性设计缺陷,其中一个例子就是图像网络,对于几类图像而言,它们所代表的物体与用于说明这些图像的标签之间有不一致之处。这个问题的后果是巨大的,特别是考虑到大量机器学习应用程序,特别是基于深神经网络的应用程序,已经就这些数据集进行了培训。在本文中,我们认为问题在于缺乏知识代表(KR)方法,为构建这些地面真相基准数据集提供基础。因此,我们提出了一个在三个主要步骤中阐述的解决办法:(一) 以遥测数学理论为基础,分解四阶段的天体识别进程;(二) 基于这种分层,提出一个新的四阶段方法,根据这些数据集的视觉特性对天体进行分类;以及(三) 根据表面分类模式进行这种分类。 我们的方法的主要新颖之处在于,我们从视觉结构学角度构建的图像结构学实验,而不是从视觉学的图像学实验中进行。