In recent years, many non-traditional classification methods, such as Random Forest, Boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness-of-fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models.
翻译:近年来,许多非传统的分类方法,如随机森林、推推和神经网络等,在应用中被广泛使用,许多非传统的分类方法,如随机森林、推推和神经网络,在应用中被广泛使用。它们的性能通常以分类准确度衡量。虽然分类错误率等很重要,但它们没有解决一个根本问题:分类方法是否不适当?根据我们的最佳知识,目前没有方法能够评估一般分类程序是否适宜;事实上,由于缺乏参数假设,很难进行适当的测试。为了克服这一困难,我们提议了一个称为BAGofT的方法,将数据分成一个培训组和一个鉴定组。首先,对成套培训采用分类程序进行评估的分类程序也适用于该组,该组也用于适应性地寻找显示最不合适的区域的数据组。 然后,根据这一组,我们计算测试测试一个测试性统计,比较估计的成功概率和从验证集中观察到的实际反应。为了克服这一困难,我们提议了一个称为BAGFF的方法, 测试现有比B的精确度分析方法要更宽泛。