Cost-effective asset management is an area of interest across several industries. Specifically, this paper develops a deep reinforcement learning (DRL) solution to automatically determine an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in an online and offline DRL setting. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct length, material, and failure rate characteristics. We train the agent using deep Q-learning (DQN) to learn an optimal policy with minimal average costs and reduced failure probability. In offline learning, the agent uses static data, e.g., DQN replay data, to learn an optimal policy via a conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies improve over standard preventive, corrective, and greedy planning alternatives. Additionally, learning from the fixed DQN replay dataset surpasses the online DQN setting. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and action trajectories provide a valuable avenue to learn rehabilitation policies in the offline setting without needing a simulator.
翻译:成本效益高的资产管理是多个行业感兴趣的领域。具体地说,本文件开发了一种深度强化学习(DRL)解决方案,以自动确定持续恶化的水管的最佳恢复政策。我们通过在线和离线的DRL环境来应对恢复规划问题。在在线DRL中,代理与具有不同长度、材料和故障率特点的多个管道的模拟环境互动。我们利用深Q-学习(DQN)来培训代理商学习最佳政策,以尽可能降低平均成本和降低故障概率。在离线学习中,代理商使用静态数据,例如DQN重放数据,通过保守的Q-学习算法来学习最佳政策,而无需与环境进一步互动。我们证明基于DRL的政策在标准的预防、纠正和贪婪规划替代办法方面有所改进。此外,从固定的DQN重放数据集超过DQN设置。结果证明,目前由大而多样的州和行动轨迹组成的水管的恶化状况为在离线设置中学习康复政策提供了宝贵的途径。