Policy Decomposition (PoDec) is a framework that lessens the curse of dimensionality when deriving policies to optimal control problems. For a given system representation, i.e. the state variables and control inputs describing a system, PoDec generates strategies to decompose the joint optimization of policies for all control inputs. Thereby, policies for different inputs are derived in a decoupled or cascaded fashion and as functions of some subsets of the state variables, leading to reduction in computation. However, the choice of system representation is crucial as it dictates the suboptimality of the resulting policies. We present a heuristic method to find a representation more amenable to decomposition. Our approach is based on the observation that every decomposition enforces a sparsity pattern in the resulting policies at the cost of optimality and a representation that already leads to a sparse optimal policy is likely to produce decompositions with lower suboptimalities. As the optimal policy is not known we construct a system representation that sparsifies its LQR approximation. For a simplified biped, a 4 degree-of-freedom manipulator, and a quadcopter, we discover decompositions that offer 10% reduction in trajectory costs over those identified by vanilla PoDec. Moreover, the decomposition policies produce trajectories with substantially lower costs compared to policies obtained from state-of-the-art reinforcement learning algorithms.
翻译:政策分解( PoDec) 是一个框架, 用来减轻在为最佳控制问题制定政策时对维度的诅咒。 对于特定系统代表, 即描述一个系统的状态变量和控制投入, PoDec 产生战略, 分解所有控制投入政策的共同优化政策。 因此, 不同投入的政策以分解或分级的方式产生, 并且作为国家变量某些子组的功能, 导致计算减少。 但是, 系统代表的选择至关重要, 因为它决定了所产生政策的次优化性。 我们提出了一种超常方法, 以找到更适于分解的表示。 我们的方法是基于这样的观察: 每一个分解都以最佳性为代价在所形成的政策中强制实施宽度模式, 而已经导致最佳政策微弱的代位, 可能会产生分解组合, 导致次优度较低的计算。 然而, 最佳政策并不为人所知, 我们所构建的系统代表制成一个系统代表制, 以大幅调控它所产生LQR 近似性 。 对于一个简化的双形, 4度的自由度比值比值比值的比值比值比值管理器, 的调值管理成本, 的比值比值比值比值比值比值比值, 的调值比值比值比值比值比值比值比值比值比值管理成本, 的比值比值比值比值比值, 的调, 的比值比值, 10比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值比值, 。