Classification is the task of assigning a new instance to one of a set of predefined categories based on the attributes of the instance. A classification tree is one of the most commonly used techniques in the area of classification. In this paper, we introduce a novel classification tree algorithm which we call Direct Nonparametric Predictive Inference (D-NPI) classification algorithm. The D-NPI algorithm is completely based on the Nonparametric Predictive Inference (NPI) approach, and it does not use any other assumption or information. The NPI is a statistical methodology which learns from data in the absence of prior knowledge and uses only few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. Due to the predictive nature of NPI, it is well suited for classification, as the nature of classification is explicitly predictive as well. The D-NPI algorithm uses a new split criterion called Correct Indication (CI). The CI is about the informativity that the attribute variables will indicate, hence, if the attribute is very informative, it gives high lower and upper probabilities for CI. The CI reports the strength of the evidence that the attribute variables will indicate, based on the data. The CI is completely based on the NPI, and it does not use any additional concepts such as entropy. The performance of the D-NPI classification algorithm is tested against several classification algorithms using classification accuracy, in-sample accuracy and tree size on different datasets from the UCI machine learning repository. The experimental results indicate that the D-NPI classification algorithm performs well in terms of classification accuracy and in-sample accuracy.
翻译:D-NPI算法是完全基于非参数预测推理法(NPI)的方法,它不使用任何其他假设或信息。NPI是一种统计方法,在缺乏先前知识的情况下从数据中学习,仅使用很少的建模假设,这是在分类领域最常用的技术之一。在本文中,我们引入了一种新型的分类树算法,我们称之为“直接非参数预测推理(D-NPI)”的分类算法。D-NPI算法完全基于非参数预测预测性预测推理(NPI)方法,它不使用任何其他假设或信息。NPI是一种统计方法,它从数据中学习,在缺乏先前知识的情况下,它只使用很少的建模假设假设,而利用低和高的概率假设,从而得以量化不确定性。由于NPII的预测性能的预测性能,因此它非常适合进行分类。由于NPI的预测性,因此,在NPI的精确性等级中,它使用多少个数值的精确性,因此,在DNA分类中,它使用多少个数值的精确性是用来评估。