There is often a scarcity of training data for machine learning (ML) classification and regression models in industrial production, especially for time-consuming or sparsely run manufacturing processes. A majority of the limited ground-truth data is used for training, while a handful of samples are left for testing. Here, the number of test samples is inadequate to properly evaluate the robustness of the ML models under test for classification and regression. Furthermore, the output of these ML models may be inaccurate or even fail if the input data differ from the expected. This is the case for ML models used in the Electroslag Remelting (ESR) process in the refined steel industry to predict the pressure in a vacuum chamber. A vacuum pumping event that occurs once a workday generates a few hundred samples in a year of pumping for training and testing. In the absence of adequate training and test samples, this paper first presents a method to generate a fresh set of augmented samples based on vacuum pumping principles. Based on the generated augmented samples, three test scenarios and one test oracle are presented to assess the robustness of an ML model used for production on an industrial scale. Experiments are conducted with real industrial production data obtained from Uddeholms AB steel company. The evaluations indicate that Ensemble and Neural Network are the most robust when trained on augmented data using the proposed testing strategy. The evaluation also demonstrates the proposed method's effectiveness in checking and improving ML algorithms' robustness in such situations. The work improves software testing's state-of-the-art robustness testing in similar settings. Finally, the paper presents an MLOps implementation of the proposed approach for real-time ML model prediction and action on the edge node and automated continuous delivery of ML software from the cloud.
翻译:在工业生产中,往往缺乏用于机器学习(ML)分类和回归模型的培训数据,特别是在耗时或运行不多的制造工艺中,在工业生产中,特别是对于耗时或运行不多的制造工艺,往往缺乏用于机械学习(ML)分类和回归模型的培训数据。大部分有限的地面真实数据用于培训,而少数样本则留作测试。在这里,测试样品的数量不足以适当评估在分类和回归测试中ML模型的稳健性。此外,这些ML模型的输出可能不准确,甚至如果输入数据不同于预期数据,则可能会失败。这是在精炼钢铁工业电炉电工冶炼(ESR)情况下用于预测真空室压力的ML模型模型使用的情况。一旦工作日生成了几百个样本用于培训和测试的抽查样本,真空抽查样本的数量就不足以适当评估ML模型的稳健健性。在钢铁规模上,通过经过培训的MLL系统测试,对A类实际生产数据进行测试时,将进行测试。