适应、分布和连续控制温度层结构 (Temporally Layered Architecture for Adaptive, Distributed and Continuous Control)

We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.

翻译：我们展示了时间层结构(TLA),这是一个时间层结构(TLA),这是一个时间适应性分布控制受生物启发的系统。TLA层是一个快速和慢控制器,可以一起实现时间抽象,让每个层能够专注于不同的时间尺度。我们的设计是生物学启发的,并吸收了在不同的时间尺度上执行行动的人类大脑结构,这取决于环境的需求。这种分布式控制设计在生物系统之间十分广泛,因为它在某些和不确定的环境中增加了生存性和准确性。我们证明TLA可以对现有方法提供许多优势,包括持续的探索、适应性控制、可解释的时间行为、计算效率和分布式控制。我们提出了两种不同的算法,用于培训TLA:(a) 闭环控制,在这种算法中,快速控制器在经过预先训练的慢速控制器中可以更好地探索快速控制器和闭环控制器,因为快速控制器在每一个时间步骤上都会增加“行动”的可靠性和准确性;以及(b)部分开放循环控制器,在这种情况下,慢控制器在经过预先训练的下一个快速控制器,允许打开回路控制器,以便进行开放式控制器的开放控制,以便进行开放式控制,以便进行开放式循环控制,从而允许打开打开可打开的开放控制室的开放控制,在一些快速定位的优势,我们可以选择一些的基线上选择。我们快速控制器可以选择一些。