In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.
翻译:在实践中,机器学习(ML)工作流程需要各种不同的步骤,从数据处理前处理、缺失的价值估算、模型选择,到模型调整和模型评价等不同步骤,其中许多步骤依靠人类ML专家。Automil(这些ML管道自动化领域)试图帮助从业者在没有任何专家知识的情况下将ML从现成的现场应用。大多数现代AutoML系统,如自动滑动、H20-Automal或TPOT系统,都是为了高预测性能,从而产生几乎完全由黑盒模型组成的组合。这反过来又使得对外行人的诠释更加复杂,为用户增加了另一层不透明性层。我们提议建立一个AutoMLL系统,建立一个可解释的添加模型,可以使用高度伸缩的组件提升算法来安装。我们的系统提供了简单易懂的模型解释工具,例如可视化部分效应和双向互动,便于直接计算特征重要性,并使人们了解所需的模型复杂性,以适应给特定任务。我们介绍了总框架,并概述了其对普通人的诠释,为用户最易操作的另外一层的一层不透明性。我们现有的自动修正的系统,以展示了现有的自动定义。