High altitude balloons have proved useful for ecological aerial surveys, atmospheric monitoring, and communication relays. However, due to weight and power constraints, there is a need to investigate alternate modes of propulsion to navigate in the stratosphere. Very recently, reinforcement learning has been proposed as a control scheme to maintain the balloon in the region of a fixed location, facilitated through diverse opposing wind-fields at different altitudes. Although air-pump based station keeping has been explored, there is no research on the control problem for venting and ballasting actuated balloons, which is commonly used as a low-cost alternative. We show how reinforcement learning can be used for this type of balloon. Specifically, we use the soft actor-critic algorithm, which on average is able to station-keep within 50\;km for 25\% of the flight, consistent with state-of-the-art. Furthermore, we show that the proposed controller effectively minimises the consumption of resources, thereby supporting long duration flights. We frame the controller as a continuous control reinforcement learning problem, which allows for a more diverse range of trajectories, as opposed to current state-of-the-art work, which uses discrete action spaces. Furthermore, through continuous control, we can make use of larger ascent rates which are not possible using air-pumps. The desired ascent-rate is decoupled into desired altitude and time-factor to provide a more transparent policy, compared to low-level control commands used in previous works. Finally, by applying the equations of motion, we establish appropriate thresholds for venting and ballasting to prevent the agent from exploiting the environment. More specifically, we ensure actions are physically feasible by enforcing constraints on venting and ballasting.
翻译:高海拔气球已证明对生态空中勘测、大气监测和通信中继器有用,但是,由于重量和动力的限制,需要调查平流层导航的替代推进模式。最近,有人提议强化学习作为控制办法,以维持固定地点区域的气球,通过不同高度不同的对立风场加以促进。虽然对以气泵为基础的站进行了研究,但没有研究排气和压压压动气球的控制问题,这种气球通常作为一种低成本的替代方法使用。我们展示了如何对这种气球使用强化学习方法。具体地说,我们使用软式的演算法,这种算法平均能够在一个固定地点保持气球气球;根据不同高度的不同对立的风力场,为维持该气球在固定的区域提供了便利。虽然对气泵站站的维护进行了探索,但对于排气球气球的排气球的排气管问题并没有进行研究。 我们把控制控制室作为持续的强化学习问题设置了一个持续的问题,这样可以使这种气球流的范围更加多样化。我们使用软式的演算算法,而不像目前的州-crial-lical-lical licalaltradealalalalalalalal dalalal lading lading lading lading laction lading lading lading lading lating laxing the laxing the laxing to lade to</s>