利用高维传感器反馈进行灌溉排期深强化深层强化学习 (Deep reinforcement learning for irrigation scheduling using high-dimensional sensor feedback)

Deep reinforcement learning has considerable potential to improve irrigation scheduling in many cropping systems by applying adaptive amounts of water based on various measurements over time. The goal is to discover an intelligent decision rule that processes information available to growers and prescribes sensible irrigation amounts for the time steps considered. Due to the technical novelty, however, the research on the technique remains sparse and impractical. To accelerate the progress, the paper proposes a general framework and actionable procedure that allow researchers to formulate their own optimisation problems and implement solution algorithms based on deep reinforcement learning. The effectiveness of the framework was demonstrated using a case study of irrigated wheat grown in a productive region of Australia where profits were maximised. Specifically, the decision rule takes nine state variable inputs: crop phenological stage, leaf area index, extractable soil water for each of the five top layers, cumulative rainfall and cumulative irrigation. It returns a probabilistic prescription over five candidate irrigation amounts (0, 10, 20, 30 and 40 mm) every day. The production system was simulated at Goondiwindi using the APSIM-Wheat crop model. After training in the learning environment using 1981--2010 weather data, the learned decision rule was tested individually for each year of 2011--2020. The results were compared against the benchmark profits obtained using irrigation schedules optimised individually for each of the considered years. The discovered decision rule prescribed daily irrigation amounts that achieved more than 96% of the benchmark profits. The framework is general and applicable to a wide range of cropping systems with realistic optimisation problems.

翻译：深入强化学习具有相当大的潜力,可以改善许多作物种植系统的灌溉时间安排,办法是根据不同测量时间应用适应性水量,从而改善许多作物系统的灌溉安排。目标是发现一个智能决策规则,处理种植者可获得的信息,并为所考虑的时间步骤规定合理的灌溉量。然而,由于技术创新,有关技术的研究仍然稀少,不切实际。为加快进展,本文件提出了一个总体框架和可操作程序,使研究人员能够根据深度强化学习制定自己的优化问题并实施解决方案算法。框架的有效性通过对澳大利亚生产地区灌溉小麦种植量的个案研究得到证明。具体而言,决定规则需要九个州变量投入:作物文艺阶段、叶色区指数、五层顶层每一层的可抽取土壤水、累积降雨和累积灌溉。为加快进度,本文件建议一个总框架和可操作性处方,使研究人员能够根据深度强化学习学习学习,使用APSIM-W热作物模型模拟生产系统。在使用1981-2010年气候数据进行学习环境培训后,每个州可变数需要九个州变量:作物文系、叶区指数、可提取土壤水土质水土质水土质水量指数,每个年进行逐测试。根据个人测测测测测测算。每个年测测测测测算。每个年。根据所测测测测测测测测测测测测测的年的灌溉利润。根据所测得的年。每个年的年的年测得的灌溉利润。根据所测得的年测得的年测得的年测得的年,对所测得的灌溉利润。