Incomplete sensor data is a major obstacle in industrial time-series analytics. In wastewater treatment plants (WWTPs), key sensors show long, irregular gaps caused by fouling, maintenance, and outages. We introduce STDiff and STDiff-W, diffusion-based imputers that cast gap filling as state-space simulation under partial observability, where targets, controls, and exogenous signals may all be intermittently missing. STDiff learns a one-step transition model conditioned on observed values and masks, while STDiff-W extends this with a context encoder that jointly inpaints contiguous blocks, combining long-range consistency with short-term detail. On two WWTP datasets (one with synthetic block gaps from Agtrup and another with natural outages from Avedøre), STDiff-W achieves state-of-the-art accuracy compared with strong neural baselines such as SAITS, BRITS, and CSDI. Beyond point-error metrics, its reconstructions preserve realistic dynamics including oscillations, spikes, and regime shifts, and they achieve top or tied-top downstream one-step forecasting performance compared with strong neural baselines, indicating that preserving dynamics does not come at the expense of predictive utility. Ablation studies that drop, shuffle, or add noise to control or exogenous inputs consistently degrade NH4 and PO4 performance, with the largest deterioration observed when exogenous signals are removed, showing that the model captures meaningful dependencies. We conclude with practical guidance for deployment: evaluate performance beyond MAE using task-oriented and visual checks, include exogenous drivers, and balance computational cost against robustness to structured outages.
翻译:传感器数据不完整是工业时间序列分析的主要障碍。在污水处理厂(WWTPs)中,关键传感器因污垢、维护和故障而出现长且不规则的缺失。我们提出了STDiff和STDiff-W这两种基于扩散的填补模型,将缺失值填补问题建模为部分可观测条件下的状态空间模拟,其中目标变量、控制变量和外生信号都可能间歇性缺失。STDiff学习一个以观测值和掩码为条件的一步转移模型,而STDiff-W通过引入上下文编码器对此进行扩展,能够联合填补连续缺失块,从而兼顾长程一致性与短期细节。在两个污水处理厂数据集(一个来自Agtrup,包含合成的块状缺失;另一个来自Avedøre,包含自然故障导致的缺失)上,与SAITS、BRITS和CSDI等强神经基线模型相比,STDiff-W取得了最先进的填补精度。除了点误差指标外,其重建结果保留了包括振荡、尖峰和状态转换在内的真实动态特性,并且在下游一步预测任务中取得了最优或并列最优的性能,这表明保留动态特性并未以牺牲预测效用为代价。消融实验表明,对控制变量或外生输入进行丢弃、打乱或添加噪声操作,均会持续导致NH4和PO4指标的性能下降,其中移除外生信号时性能恶化最为显著,这证明模型捕捉到了有意义的依赖关系。最后,我们提供了实际部署建议:应使用面向任务的评估和可视化检查,而不仅限于MAE指标;需包含外生驱动变量;并需在计算成本与对结构化缺失的鲁棒性之间取得平衡。