ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and $\ell_0$-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.
翻译:控制 Burn 是用于构建非线性特征选择和可解释的机器学习的地貌分析树集合的 Python 软件包, 支持非线性特征选择和可解释的机器学习。 这个软件包的算法首先构建大树集合, 以少数特性为基准功能的优先排序, 然后使用加权的 lasso 优化标准选择这些基础功能的地貌分析子子子子集。 该软件包包括用于分析由组合所选特征及其对预测的影响的可视化功能。 因此, 控制 Burn 提供了树类模型的准确性和灵活性以及稀有的通用添加型模型的可解释性。 控制 Burn 框架可以可缩放和灵活: 例如, 它可以使用温暖的启动性继续来为包含数以万个样本和数以百秒计特征的数据集拼写正规化路径( 任何选定特性的错误) 。 对于更大的数据集来说, 运行时间尺度是样本和特性数的线性尺度( 到一个逻辑系数), 包支持使用素描图的加速。 此外, 框架框架可以容纳成本成本, 组合, 和 $ell_0__burg/ brentrent / brudepril 。 。 。 。 。 。 。 。 。 和 和 $_ 。