FOLD-R is an automated inductive learning algorithm for learning default rules for mixed (numerical and categorical) data. It generates an (explainable) answer set programming (ASP) rule set for classification tasks. We present an improved FOLD-R algorithm, called FOLD-R++, that significantly increases the efficiency and scalability of FOLD-R by orders of magnitude. FOLD-R++ improves upon FOLD-R without compromising or losing information in the input training data during the encoding or feature selection phase. The FOLD-R++ algorithm is competitive in performance with the widely-used XGBoost algorithm, however, unlike XGBoost, the FOLD-R++ algorithm produces an explainable model. FOLD-R++ is also competitive in performance with the RIPPER system, however, on large datasets FOLD-R++ outperforms RIPPER. We also create a powerful tool-set by combining FOLD-R++ with s(CASP)-a goal-directed ASP execution engine-to make predictions on new data samples using the answer set program generated by FOLD-R++. The s(CASP) system also produces a justification for the prediction. Experiments presented in this paper show that our improved FOLD-R++ algorithm is a significant improvement over the original design and that the s(CASP) system can make predictions in an efficient manner as well.
翻译:FOLD-R+是用于学习混合(数字和绝对)数据默认规则的自动感化学习算法,用于学习混合(数字和绝对)数据的默认规则。它生成了一种(可解释的)回答数据集编程(ASP)规则,用于分类任务。我们提出了一个改进的FOLD-R算法,称为FOLD-R++,通过数量级令大大提高了FOLD-R的效能和可缩放性。FOLD-R++在编码或特征选择阶段,在输入培训数据中不损及或丢失信息,对FOLD-R++进行改进。FOLD-R+A在运行过程中具有竞争力,但与XGBost不同,FOLD-R++的算法产生了一种可解释的模型。FOLD-R++在大型数据集中也具有竞争力,FOLD-R++在输入原预测文件时,我们也可以用FORD-R+AVLD的预测程序对新数据进行预测。