Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. This is sometimes called label bias, and raises the question whether the object was mislabeled. Our goal is to visualize aspects of the data classification to obtain insight. The proposed display reflects to what extent each object's label is (dis)similar to its prediction, how far each object lies from the other objects in its class, and whether some objects lie far from all classes. The display is constructed for discriminant analysis, the k-nearest neighbor classifier, support vector machines, logistic regression, and majority voting. It is illustrated on several benchmark datasets containing images and texts.
翻译:分类法首先处理一组具有特定类别( 标签) 的培训对象, 目的是在后期为其中之一分配新的对象。 在对培训数据或测试数据进行相应的预测方法时, 可能会出现一个对象被预测为与给定标签不同的类别。 这有时被称为标签偏差, 并引起对象标签是否错误的问题 。 我们的目标是将数据分类的方方面面进行视觉化, 以获得洞察 。 拟议的显示显示显示显示每个对象的标签在多大程度上( 不同) 与它的预测不同, 每个对象与该类中的其他对象的距离有多远, 以及某些对象是否与所有类别都相距甚远 。 显示是用来进行辨别分析的, k- 近邻分类器、 支持向量机器、 逻辑回归 和 多数投票 。 它在包含图像和文本的多个基准数据集上进行了说明 。