A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplicity of HRBMs, it turns out that these simple base models allow us to construct effective ensemble-based models and avoid overfitting. A simple method for calculating optimal regularization parameters of the ensemble-based model, which can be modified in the explicit way at each iteration of GBM, is considered. Moreover, a new regularization called the "step height penalty" is studied in addition to the standard L1 and L2 regularizations. An extremely simple approach to the proposed ensemble-based model prediction interpretation by using the well-known method SHAP is proposed. It is shown that GBM with HRBM can be regarded as a model extending a set of interpretable models for explaining black-box models. Numerical experiments with real datasets illustrate the proposed GBM with HRBMs for regression and classification problems. Experiments also illustrate computational efficiency of the proposed SHAP modifications. The code of proposed algorithms implementing GBM with HRBM is publicly available.
翻译:提出了两种类型的HRBM : 封闭矩形和角。 HRBM 的主要想法是考虑和计算每个矩形内外的培训实例。 建议将HRBM 纳入梯度推进机(GBM ) 。 尽管HRBM 简单,但这些简单的基础模型允许我们构建有效的基于共同点的模型并避免过度适应。 一种用于计算基于共同点的模式的最佳正规化参数的简单方法,可以在GBM的每一次迭代中以明确的方式加以修改。 此外,除了标准 L1 和 L2 正规化外,还研究称为“职级高度处罚”的新的正规化。 提议采用著名的SHAP 方法,对基于元素的模型预测解释方法非常简单。 显示,与HRBMBM 一起的GBM可以被视为一个模型模型模型的模型,用以在G-BM 的每次迭代版本中明确修改。 与G-BM BRC 模型的模型的拟议解释性推算。</s>