Despite various breakthroughs in machine learning and data analysis techniques for improving smart operation and management of urban water infrastructures, some key limitations obstruct this progress. Among these shortcomings, the absence of freely available data due to data privacy or high costs of data gathering and the nonexistence of adequate rare or extreme events in the available data plays a crucial role. Here, Generative Adversarial Networks (GANs) can help overcome these challenges. In machine learning, generative models are a class of methods capable of learning data distribution to generate artificial data. In this study, we developed a GAN model to generate synthetic time series to balance our limited recorded time series data and improve the accuracy of a data-driven model for combined sewer flow prediction. We considered the sewer system of a small town in Germany as the test case. Precipitation and inflow to the storage tanks are used for the Data-Driven model development. The aim is to predict the flow using precipitation data and examine the impact of data augmentation using synthetic data in model performance. Results show that GAN can successfully generate synthetic time series from real data distribution, which helps more accurate peak flow prediction. However, the model without data augmentation works better for dry weather prediction. Therefore, an ensemble model is suggested to combine the advantages of both models.
翻译:尽管在改进城市水基础设施的智能操作和管理的机器学习和数据分析技术方面取得了各种突破,但一些关键的限制因素阻碍了这一进展,其中包括:由于数据隐私或数据收集费用高昂,以及现有数据中不存在适当的稀有或极端事件,缺乏可自由获取的数据,这些缺点具有关键作用。这里,基因反转网络(GANs)可以帮助克服这些挑战。在机器学习中,基因化模型是能够学习数据传播以生成人工数据的一组方法。在本研究中,我们开发了一个GAN模型,以生成合成时间序列,以平衡我们有限记录的时间序列数据,并提高数据驱动模型的准确性,用于综合下水道流量预测。我们认为,德国一个小城镇的下水道系统是试验案例。在数据驱动模型开发过程中,使用热量和流入储油罐的情况可以帮助克服这些挑战。目的是利用降水数据预测流量,并利用模型性能合成数据来审查数据增强的影响。结果显示,GAN模型能够成功地从真实数据分布中生成合成时间序列,从而有助于更精确的峰流预测。但是,我们认为,德国的一个小城镇的下水道系统系统系统系统系统是更好的模型。