FOLD-SE: 高效的、基于规则的、具有可缩放可解释性的机器学习算法 (FOLD-SE: An Efficient Rule-based Machine Learning Algorithm with Scalable Explainability)

We present FOLD-SE, an efficient, explainable machine learning algorithm for classification tasks given tabular data containing numerical and categorical values. FOLD-SE generates a set of default rules-essentially a stratified normal logic program-as an (explainable) trained model. Explainability provided by FOLD-SE is scalable, meaning that regardless of the size of the dataset, the number of learned rules and learned literals stay quite small while good accuracy in classification is maintained. A model with smaller number of rules and literals is easier to understand for human beings. FOLD-SE is competitive with state-of-the-art machine learning algorithms such as XGBoost and Multi-Layer Perceptrons (MLP) wrt accuracy of prediction. However, unlike XGBoost and MLP, the FOLD-SE algorithm is explainable. The FOLD-SE algorithm builds upon our earlier work on developing the explainable FOLD-R++ machine learning algorithm for binary classification and inherits all of its positive features. Thus, pre-processing of the dataset, using techniques such as one-hot encoding, is not needed. Like FOLD-R++, FOLD-SE uses prefix sum to speed up computations resulting in FOLD-SE being an order of magnitude faster than XGBoost and MLP in execution speed. The FOLD-SE algorithm outperforms FOLD-R++ as well as other rule-learning algorithms such as RIPPER in efficiency, performance and scalability, especially for large datasets. A major reason for scalable explainability of FOLD-SE is the use of a literal selection heuristics based on Gini Impurity, as opposed to Information Gain used in FOLD-R++. A multi-category classification version of FOLD-SE is also presented.

翻译：我们为包含数字和绝对值的表格数据提供的分类任务提供了高效、可解释的计算机学习算法FOLD-SE。 FOLD-SE 生成了一套默认规则,基本是一个标准正常逻辑逻辑程序,作为(可解释的)经过训练的模型。FOLD-SE 提供的可解释性是可扩缩的,这意味着无论数据集大小,所学规则的数量和所学的字数都相当小,而分类的准确性则保持良好。一个具有较少规则和字数的模型更容易为人类所理解。 FOLD-SE 与一套最先进的机器学习算法(如XGBoost和Muly-Layer Pereptrons (MLMP) ) 类的默认逻辑程序具有竞争力。然而,FOLD-S 运算法基于我们早先开发的可解释的FOLD-R+M 机器学习算法, 其所有积极性特征都比我们更容易理解。因此, 数据处理前的预处理, 使用一种技术(如IM-LD-LLL AS 高级算算算算算算算算算算法 ), 其高的计算中, 也不需要-LDLDFO-LDLDLDLD 需要使用其他的快速的快速算法, 的快速的计算法, 以S-LDDLDDDDDDDDDDDDDD 的计算法, 格式, 的高级计算法, 的高级算法, 的高级算法, 的高级算法是用来在FOFODFODDMDDDDDDDDDDDDD 。