Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. This is sometimes called label bias, and raises the question whether the object was mislabeled. The proposed class map reflects the probability that an object belongs to an alternative class, how far it is from the other objects in its given class, and whether some objects lie far from all classes. The goal is to visualize aspects of the classification results to obtain insight in the data. The display is constructed for discriminant analysis, the k-nearest neighbor classifier, support vector machines, logistic regression, and coupling pairwise classifications. It is illustrated on several benchmark datasets, including some about images and texts.
翻译:分类法首先处理一组有特定类别对象的培训对象( 标签), 目的是在后期为其中的一个类别分配新的对象。 在对培训数据或测试数据运行由此得出的预测方法时, 可能会发生一个对象被预测位于与给定标签不同的类别中。 这有时被称为标签偏差, 并提出了对象标签是否错误的问题 。 拟议的类图反映了一个对象属于替代类别的可能性, 与给定类别中其他对象的距离有多远, 以及某些对象是否远离所有类别 。 目标是直观分类结果的各个方面, 以获取对数据的洞察力 。 该显示是用来进行共振分析的, k 最近的邻居分类器、 支持矢量机器、 逻辑回归 和 组合对等分类 。 它在几个基准数据集上被演示, 包括一些图像和文本 。