Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
翻译:由数据驱动的软传感器广泛用于工业和化学过程,以预测难以计量的流程变量,这些变量的实际价值在日常操作期间难以追踪。这些传感器使用的回归模型往往需要大量贴标签的例子,然而,鉴于质量检查所需的时间和成本较高,获取标签信息的费用可能非常昂贵。在这方面,积极学习的方法可能非常有益,因为它们可以表明最有信息可查询的标签。然而,为回归而提出的大多数积极学习战略侧重于离线设置。在这项工作中,我们将这些方法中的一些方法调整到基于流的设想中,并表明如何使用它们来选择信息最丰富的数据点。我们还演示如何使用以垂直自动自动编码为基础的半监督结构来学习较低维度空间的显著特征。田纳西东部进程用于比较拟议方法的预测性能。</s>