This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.
翻译:本文建议一种自动创建变量的方法(在回归的情况下),以补充初始输入矢量所含的信息。该方法作为预处理步骤,将要递减的变量的连续值分解成一组间隔,然后用来确定值阈值。然后,分类人员接受培训,以预测要递减的值是否低于或等于这些阈值中的每一个阈值。然后,分类人员的不同产出以补充变量矢量的形式进行整合,以丰富回归问题的初始矢量。因此,已实施的系统可被视为一种通用预处理工具。我们用5种递减器测试了拟议的浓缩方法,并在33个回归数据集中进行了评估。我们的实验结果证实了这一方法的意向。