In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML). Many of these are predestined to fail from the beginning. In traditional software engineering, this problem is addressed via a feasibility study, an indispensable step before developing any software system. In this paper, we present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study before building ML applications. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER), which stems from data quality issues in datasets used to train or evaluate ML model artifacts. We design a practical Bayes error estimator that is compared against baseline feasibility study candidates on 6 datasets (with additional real and synthetic noise of different levels) in computer vision and natural language processing. Furthermore, by including our systematic feasibility study with additional signals into the iterative label cleaning process, we demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
翻译:在与使用今天的自动ML系统的域专家合作的经验中,我们遇到的一个共同问题是我们所谓的“不现实期望”——当用户面对一个非常艰巨的任务时,数据获取过程过于繁琐,而用户则面临一个非常艰巨的任务,而机器学习(ML)则预期会达到惊人的高精度。其中许多问题注定从一开始就会失败。在传统的软件工程中,这个问题是通过可行性研究来解决的,这是开发任何软件系统之前不可或缺的一步。在本文中,我们介绍Snoopy,目的是支持数据科学家和机器学习工程师在建立ML应用程序之前进行系统、理论基础的可行性研究。我们处理这一问题的方法是估计基本任务不可避免的错误,即所谓的Bayes错误率(BER),这源于用于培训或评价ML模型文物的数据集的数据质量问题。我们设计了一个实用的海湾错误估计器,与计算机视觉和自然语言处理中的6个数据集(以及不同级别的其他真实和合成噪音)的基线可行性研究对象进行比较。此外,我们用系统的可行性研究,将更多的信号纳入迭代标签清理过程,我们如何在最终进行重大的实验。