使用 Maro 自动调试自动自动溶解管道: ML 自动补救 Oracle( 扩展版本) (Automatically Debugging AutoML Pipelines Using Maro: ML Automated Remediation Oracle (Extended Version))

Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This paper describes an approach for automatically debugging an ML pipeline, explaining the failures, and producing a remediation. We implemented our approach, which builds on a combination of AutoML and SMT, in a tool called Maro. Maro works seamlessly with the familiar data science ecosystem including Python, Jupyter notebooks, scikit-learn, and AutoML tools such as Hyperopt. We empirically evaluate our tool and find that for most cases, a single remediation automatically fixes errors, produces no additional faults, and does not significantly impact optimal accuracy nor time to convergence.

翻译：在实践中,机器学习往往涉及数据清洗、地物工程、预处理和预测等复杂的管道,这些管道由操作者组成,它们必须正确连接,其超参数必须正确配置。不幸的是,对于数据集、操作者或超参数的某些组合来说,这很常见,会造成失败。诊断和纠正这些失败是乏味和容易出错的,可以严重干扰数据科学家的工作流程。本文描述了自动调试ML管道、解释失败和产生补救的方法。我们实施了我们的方法,它建立在自动ML和SMT(SMT)的组合上,使用名为Maro的工具。Maro与熟悉的数据科学生态系统,包括Python、Jupyter笔记本、Scikit-learn和UtalML工具,如Huperopt。我们用经验评估了我们的工具,发现在大多数情况下,单项补救自动纠正错误,不会产生额外的错误,也不会对最佳精确性和时间产生显著的影响。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日