Road maintenance planning is an integral part of road asset management. One of the main challenges in Maintenance and Rehabilitation (M&R) practices is to determine maintenance type and timing. This research proposes a framework using Reinforcement Learning (RL) based on the Long Term Pavement Performance (LTPP) database to determine the type and timing of M&R practices. A predictive DNN model is first developed in the proposed algorithm, which serves as the Environment for the RL algorithm. For the Policy estimation of the RL model, both DQN and PPO models are developed. However, PPO has been selected in the end due to better convergence and higher sample efficiency. Indicators used in this study are International Roughness Index (IRI) and Rutting Depth (RD). Initially, we considered Cracking Metric (CM) as the third indicator, but it was then excluded due to the much fewer data compared to other indicators, which resulted in lower accuracy of the results. Furthermore, in cost-effectiveness calculation (reward), we considered both the economic and environmental impacts of M&R treatments. Costs and environmental impacts have been evaluated with paLATE 2.0 software. Our method is tested on a hypothetical case study of a six-lane highway with 23 kilometers length located in Texas, which has a warm and wet climate. The results propose a 20-year M&R plan in which road condition remains in an excellent condition range. Because the early state of the road is at a good level of service, there is no need for heavy maintenance practices in the first years. Later, after heavy M&R actions, there are several 1-2 years of no need for treatments. All of these show that the proposed plan has a logical result. Decision-makers and transportation agencies can use this scheme to conduct better maintenance practices that can prevent budget waste and, at the same time, minimize the environmental impacts.
翻译:道路维护规划是道路资产管理的一个组成部分。 道路维护规划是道路资产管理的一个组成部分。 维护和修复(M&R)做法的主要挑战之一是确定维护类型和时机。 本研究提议了一个框架,根据长期铺面业绩数据库(LTPP),使用强化学习(RLL)来确定监控和修复做法的类型和时间。 一个预测的DNN模型最初是在拟议的算法中开发的,该算法作为RL算法的环境值。 关于RL模型的政策估计,DQN和PPPO模型都得到了开发。然而,最终选择PPPO是为了更好的趋同和更高的采样效率。 本研究中使用的指标是国际粗糙学习指数(RIRI)和RuttingExplical(RDR)。 最初,我们把Crackmetric(CMCM)模型作为第三个指标,但后来却被排除在外,因为与其他指标相比数据要少得多得多,这可以降低结果的准确性。 此外,在成本-效益计算中, M&R处理的所有经济和环境影响。 成本和环境影响已经与重度评估了成本-环境影响。 在PaLATE 2.20年的运行中,我们的方法在20年的模型中,一个模型中,一个假设的模型中,一个测试结果在20年的模型中显示一个测试结果。