An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often driven by incidental factors, rather than by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This is the first time in the literature that a paper discusses the choice of design in relation to the ML model performances. An extensive study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners, providing guidelines for practical applications of DOE and ML.
翻译:越来越多的出版物将联合应用实验设计和机器学习作为收集和分析特定工业现象数据的一种方法,然而,文献表明,数据采集和数据分析模型的设计选择往往是由附带因素而不是统计或算法优势驱动的,因此缺乏关于哪些设计和ML模型可共同用于数据收集和分析的指导性研究。这是文献中首次有一份文件讨论与ML模型性能有关的设计选择性研究。开展的一项广泛研究考虑了12个实验性设计、7个预测性模型组、7个模拟物理过程的测试功能和8个噪音环境,两者都是同源和异源的。研究结果可对从业人员的工作产生直接的影响,为DOE和ML的实际应用提供指导方针。