Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. This is sometimes called label bias, and raises the question whether the object was mislabeled.Our goal is to visualize aspects of the data classification to obtain insight. The proposed display reflects to what extent each object's label is (dis)similar to its prediction, how far each object lies from the other objects in its class, and whether some objects lie far from all classes. The display is constructed for discriminant analysis, the k-nearest neighbor classifier, support vector machines, logistic regression, and majority voting. It is illustrated on several benchmark datasets containing images and texts.
翻译:分类法首先处理一组具有特定类别( 标签) 的培训对象, 目的是在后期为其中之一分配新的对象。 在对培训数据或测试数据进行相应的预测方法时, 可能会发生一个对象预计将位于与给定标签不同的类中。 这有时被称为标签偏差, 并提出了对象是否贴错标签的问题 。 我们的目标是将数据分类的方方面面进行视觉化, 以获得洞察 。 拟议的显示显示显示显示每个对象的标签在多大程度上( 不同) 与它的预测不同, 每个对象与该类中的其他对象的距离有多远, 以及某些对象是否远离所有类别 。 显示是用来进行辨别分析的, k- 近邻分类器、 支持向量机器、 逻辑回归 和 多数投票 。 显示于包含图像和文本的多个基准数据集 。