Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also useful for the testing of machine learning software. Concretely, we try to determine generic and simple smoke tests that can be used to assert that basic functions can be executed without crashing. We found that we can derive such tests using techniques similar to equivalence classes and boundary value analysis. Moreover, we found that these concepts can also be applied to hyperparameters, to further improve the quality of the smoke tests. Even though our approach is almost trivial, we were able to find bugs in all three machine learning libraries that we tested and severe bugs in two of the three libraries. This demonstrates that common software testing techniques are still valid in the age of machine learning and that considerations how they can be adapted to this new context can help to find and prevent severe bugs, even in mature machine learning libraries.
翻译:目前,机器学习是软件应用中数据分析的标准技术。 软件工程师需要适合这些新型系统的质量保证技术。 在本条款中, 我们讨论自几十年以来作为教科书一部分的标准软件测试技术是否也有益于机器学习软件的测试。 具体地说, 我们试图确定通用和简单的烟雾测试, 可以用来断言基本功能可以在不崩溃的情况下执行。 我们发现, 我们可以使用类似等效类和边界值分析的技术来得出这样的测试。 此外, 我们发现, 还可以将这些概念应用到超光度计上, 进一步提高烟雾测试的质量。 尽管我们的方法几乎微不足道, 但我们在三个图书馆中的三个机器学习图书馆都发现了错误, 我们测试和严重错误。 这证明通用的软件测试技术在机器学习的时代仍然有效, 并且考虑这些技术如何能够适应这一新的环境, 有助于发现和防止严重错误, 即使在成熟的机器学习图书馆中也是如此。