Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.
翻译:强化学习通常与奖励最大化(或成本最小化)控制器的培训有关,换句话说,它通常与奖励最大化(或成本最小化)控制器的培训有关,可以采用无模型或基于模型的方式应用,使用先验或在线收集的系统数据来培训涉及的参数结构。一般来说,在线强化学习不能保证闭环稳定,除非采取特别措施,例如通过学习限制或有针对性的培训规则。特别有希望的是强化学习与“古典”控制方法的混合。在这项工作中,我们建议一种方法来保证系统控制器闭环在纯粹的在线学习环境中的实际稳定性,即不进行离线培训。此外,我们假定对系统模式只部分了解。为了取得声称的结果,我们采用典型的适应控制技术。总体控制计划的实施在数字抽样环境中得到明确提供,也就是说,控制器接收系统状态,并在离散的、特别是相当短的时间内对控制动作进行控制。该方法在适应性牵引力控制和巡航控制中经过测试,证明可以大大降低成本。