Cost-effective asset management is an area of interest across several industries. Specifically, this paper develops a deep reinforcement learning (DRL) solution to automatically determine an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in an online and offline DRL setting. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct lengths, materials, and failure rate characteristics. We train the agent using deep Q-learning (DQN) to learn an optimal policy with minimal average costs and reduced failure probability. In offline learning, the agent uses static data, e.g., DQN replay data, to learn an optimal policy via a conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies improve over standard preventive, corrective, and greedy planning alternatives. Additionally, learning from the fixed DQN replay dataset in an offline setting further improves the performance. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and action trajectories provide a valuable avenue to learn rehabilitation policies in the offline setting, which can be further fine-tuned using the simulator.
翻译:成本效益的资产管理是多个行业感兴趣的领域。特别地,本文开发了一种基于深度强化学习(DRL)的解决方案,以自动确定连续恶化的水管的最佳修复策略。我们在在线和离线DRL设置中处理修复计划的问题。在在线DRL中,代理与具有不同长度、材料和故障率特征的多个管道的模拟环境进行交互。我们使用深度Q学习(DQN)训练代理程序,以学习具有最小平均成本和降低故障概率的最佳策略。在离线学习中,代理使用静态数据,例如DQN重放数据,通过保守的Q学习算法学习最佳策略,而无需进一步与环境交互。我们证明,基于DRL的策略比标准的预防性、纠正性和贪婪性计划替代方案更好。此外,在离线设置中从固定的DQN重放数据集学习进一步提高了性能。结果表明,由大且多样化的状态和行动轨迹组成的水管的现有恶化模式为学习离线修复策略提供了宝贵的途径,这可以进一步使用模拟器进行微调。