利用组合互动测试进行机械学习系统培训和测试 (Systematic Training and Testing for Machine Learning Using Combinatorial Interaction Testing)

This paper demonstrates the systematic use of combinatorial coverage for selecting and characterizing test and training sets for machine learning models. The presented work adapts combinatorial interaction testing, which has been successfully leveraged in identifying faults in software testing, to characterize data used in machine learning. The MNIST hand-written digits data is used to demonstrate that combinatorial coverage can be used to select test sets that stress machine learning model performance, to select training sets that lead to robust model performance, and to select data for fine-tuning models to new domains. Thus, the results posit combinatorial coverage as a holistic approach to training and testing for machine learning. In contrast to prior work which has focused on the use of coverage in regard to the internal of neural networks, this paper considers coverage over simple features derived from inputs and outputs. Thus, this paper addresses the case where the supplier of test and training sets for machine learning models does not have intellectual property rights to the models themselves. Finally, the paper addresses prior criticism of combinatorial coverage and provides a rebuttal which advocates the use of coverage metrics in machine learning applications.

翻译：本文展示了为机械学习模式选择和定性测试和培训成套材料的分类覆盖的系统使用; 介绍的工作调整了组合互动测试,在确定软件测试的缺陷方面成功地利用了这种测试来描述机器学习中所使用的数据; MNIST手写的数字数据用于证明可使用组合覆盖来选择压力机器学习模型性能的测试组,选择能够产生稳健模型性能的培训组,并为微调模型选择数据到新领域。因此,结果将组合覆盖作为培训和测试机器学习的整体方法。与以前侧重于使用神经网络内部覆盖的工作相比,本文考虑了对输入和产出产生的简单特征的覆盖。因此,本文述及了机器学习模型的测试和培训组供应商本身不具有知识产权的情况。最后,本文述及了对组合覆盖范围的先前批评,并提供了一种反驳,主张在机器学习应用中使用覆盖指标。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日