An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often not driven by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This article discusses the choice of design in relation to the ML model performances. A study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners, providing guidelines for practical applications of DOE and ML.
翻译:越来越多的出版物将联合应用实验设计和机器学习作为收集和分析特定工业现象数据的一种方法,然而,文献表明,数据采集和数据分析模型的设计选择往往不是由统计或算法优势驱动的,因此缺乏关于哪些设计和ML模型可共同用于数据收集和分析的指南的研究。本文章讨论了与ML模型性能有关的设计选择。进行了一项研究,研究了12个实验设计、7个预测模型组、7个模拟模型组、7个模拟物理过程模拟测试功能和8个噪音环境,包括同源体和超值体。研究结果可对从业人员的工作产生直接影响,为DOE和ML的实际应用提供指南。