We propose a novel methodology for general multi-class classification in arbitrary feature spaces, which results in a potentially well-calibrated classifier. Calibrated classifiers are important in many applications because, in addition to the prediction of mere class labels, they also yield a confidence level for each of their predictions. In essence, the training of our classifier proceeds in two steps. In a first step, the training data is represented in a latent space whose geometry is induced by a regular $(n-1)$-dimensional simplex, $n$ being the number of classes. We design this representation in such a way that it well reflects the feature space distances of the datapoints to their own- and foreign-class neighbors. In a second step, the latent space representation of the training data is extended to the whole feature space by fitting a regression model to the transformed data. With this latent-space representation, our calibrated classifier is readily defined. We rigorously establish its core theoretical properties and benchmark its prediction and calibration properties by means of various synthetic and real-world data sets from different application domains.
翻译:我们为任意地物空间的普通多级分类提出了一种新的方法,该方法可能导致一个可能得到良好校准的分类师。 校准的分类师在许多应用中非常重要, 因为除了仅仅类标签的预测之外, 校准的分类师还为其每一项预测产生信任度。 本质上, 我们的分类师的培训分两个步骤进行。 第一步, 培训数据代表于一个潜在空间, 该潜在空间的几何是由一个常规的 $( n-1) 维度简单x( $- x- situx) 引导的, 即 类数 。 我们设计这种表达方式时, 能够很好地反映数据点与其自己和外国类邻居之间的空间距离。 第二步, 培训数据的潜在空间代表通过将回归模型与转换的数据相匹配, 扩展到整个特征空间。 我们的校准分类师很容易被定义。 我们严格建立其核心理论属性, 并通过来自不同应用领域的各种合成和现实世界数据集对其预测和校准特性进行基准。