Industrial applications of machine learning face unique challenges due to the nature of raw industry data. Preprocessing and preparing raw industrial data for machine learning applications is a demanding task that often takes more time and work than the actual modeling process itself and poses additional challenges. This paper addresses one of those challenges, specifically, the challenge of missing values due to sensor unavailability at different production units of nonlinear production lines. In cases where only a small proportion of the data is missing, those missing values can often be imputed. In cases of large proportions of missing data, imputing is often not feasible, and removing observations containing missing values is often the only option. This paper presents a technique, that allows to utilize all of the available data without the need of removing large amounts of observations where data is only partially available. We do not only discuss the principal idea of the presented method, but also show different possible implementations that can be applied depending on the data at hand. Finally, we demonstrate the application of the presented method with data from a steel production plant.
翻译:机器学习的工业应用由于原始工业数据的性质而面临独特的挑战; 机器学习应用的预处理和编制原始工业数据是一项艰巨的任务,往往比实际的建模过程本身花费更多的时间和工作,并构成额外的挑战; 本文述及其中一项挑战,具体地说,由于非线性生产线不同生产线生产线的不同生产线上没有传感器,导致缺少数值的挑战; 在数据少了一小部分的情况下,往往可以算出这些缺失值; 在大量缺乏数据的情况下,估算往往不可行,而删除含有缺失值的观测结果往往是唯一的选择。 本文提出了一种技术,使所有现有数据都能使用,而无需删除大量观测数据,而只提供部分数据; 我们不仅讨论所述方法的主要想法,而且显示根据手头数据可以应用的不同可能实施的方法。 最后,我们展示了对钢铁生产厂数据采用的方法。