Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
翻译:数据驱动的软测量在工业和化学过程中广泛应用,用于预测难以测量的过程变量,而在常规操作期间跟踪其真实值很困难。这些传感器使用的回归模型通常需要大量标记示例,然而考虑到质量检查所需的高时间和成本,获得标签信息可能非常昂贵。在这种情况下,主动学习方法可以非常有益,因为它们可以建议查询最有信息量的标签。然而,为回归提出的大多数主动学习策略都着重于离线设置。在这项工作中,我们改编了一些这些方法,使其适用于流式处理情况,并展示了如何选择最有信息量的数据点。我们还演示了如何使用基于正交自编码器的半监督架构来在较低维度空间中学习显着特征。使用Tennessee Eastman过程来比较所提出的方法的预测性能。