Recently, many improved naive Bayes methods have been developed with enhanced discrimination capabilities. Among them, regularized naive Bayes (RNB) produces excellent performance by balancing the discrimination power and generalization capability. Data discretization is important in naive Bayes. By grouping similar values into one interval, the data distribution could be better estimated. However, existing methods including RNB often discretize the data into too few intervals, which may result in a significant information loss. To address this problem, we propose a semi-supervised adaptive discriminative discretization framework for naive Bayes, which could better estimate the data distribution by utilizing both labeled data and unlabeled data through pseudo-labeling techniques. The proposed method also significantly reduces the information loss during discretization by utilizing an adaptive discriminative discretization scheme, and hence greatly improves the discrimination power of classifiers. The proposed RNB+, i.e., regularized naive Bayes utilizing the proposed discretization framework, is systematically evaluated on a wide range of machine-learning datasets. It significantly and consistently outperforms state-of-the-art NB classifiers.
翻译:最近,随着歧视能力的提高,制定了许多改进的天真的贝ys方法,其中,正规化的天真贝ys(RNB)通过平衡歧视力量和一般化能力,取得了优异的成绩。数据分解在天真的贝ys中很重要。通过将类似值分组到一个间隔,数据分布可以更好地估算。但是,包括RNB在内的现有方法往往将数据分解到太少的间隔,这可能导致信息大量损失。为了解决这一问题,我们提议为天真的贝ys建立一个半监督的适应性、有区别的分解框架,通过使用标签的数据和假标签技术的无标签数据,可以更好地估计数据分配情况。拟议方法还利用适应性分解计划,极大地减少分解过程中的信息损失,从而大大改善分类者的歧视力量。拟议的RNB+(即利用拟议的离散化框架的正规化天真湾)在一系列机器学习数据集上得到系统评估。该方法大大和一贯地超越了标准化的NBG分类系统。