The success of deep neural networks in image classification and learning can be partly attributed to the features they extract from images. It is often speculated about the properties of a low-dimensional manifold that models extract and learn from images. However, there is not sufficient understanding about this low-dimensional space based on theory or empirical evidence. For image classification models, their last hidden layer is the one where images of each class is separated from other classes and it also has the least number of features. Here, we develop methods and formulations to study that feature space for any model. We study the partitioning of the domain in feature space, identify regions guaranteed to have certain classifications, and investigate its implications for the pixel space. We observe that geometric arrangements of decision boundaries in feature space is significantly different compared to pixel space, providing insights about adversarial vulnerabilities, image morphing, extrapolation, ambiguity in classification, and the mathematical understanding of image classification models.
翻译:深神经网络在图像分类和学习方面的成功可部分归因于它们从图像中提取的特征,往往被推测出模型从图像中提取和学习的低维维体的特性,然而,根据理论或经验证据,对于这一低维空间没有足够的理解。对于图像分类模型来说,它们的最后隐藏层是每个类的图像与其他类的图像分离,其特征也最少。在这里,我们开发了为任何模型研究该特征空间的方法和配方。我们研究了地貌空间域的分割,确定了保证有某些分类的区域,并研究了其对像素空间的影响。我们观察到,地貌空间决定界限的几何安排与像素空间有很大不同,提供了对立脆弱性、图像变形、外推法、分类中的模糊性以及图像分类模型的数学理解。