机器学习烟雾测试:发现严重缺陷的简单测试 (Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects)

Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also useful for the testing of machine learning software. Concretely, we try to determine generic and simple smoke tests that can be used to assert that basic functions can be executed without crashing. We found that we can derive such tests using techniques similar to equivalence classes and boundary value analysis. Moreover, we found that these concepts can also be applied to hyperparameters, to further improve the quality of the smoke tests. Even though our approach is almost trivial, we were able to find bugs in all three machine learning libraries that we tested and severe bugs in two of the three libraries. This demonstrates that common software testing techniques are still valid in the age of machine learning and that considerations how they can be adapted to this new context can help to find and prevent severe bugs, even in mature machine learning libraries.

翻译：目前,机器学习是软件应用中数据分析的标准技术。软件工程师需要适合这些新型系统的质量保证技术。在本条款中, 我们讨论自几十年以来作为教科书一部分的标准软件测试技术是否也有益于机器学习软件的测试。具体地说, 我们试图确定通用和简单的烟雾测试, 可以用来断言基本功能可以在不崩溃的情况下执行。我们发现, 我们可以使用类似等效类和边界值分析的技术来得出这样的测试。此外, 我们发现, 还可以将这些概念应用到超光度计上, 进一步提高烟雾测试的质量。尽管我们的方法几乎微不足道, 但我们在三个图书馆中的三个机器学习图书馆都发现了错误, 我们测试和严重错误。这证明通用的软件测试技术在机器学习的时代仍然有效, 并且考虑这些技术如何能够适应这一新的环境, 有助于发现和防止严重错误, 即使在成熟的机器学习图书馆中也是如此。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

专知会员服务

170+阅读 · 2020年5月10日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning