We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
翻译:我们研究的是从潜在的高层次观测中学习国家代表的任务,目的是控制一个未知部分观察的系统。我们追求一种直接的潜在模型学习方法,通过预测与规划直接有关的数量(例如成本)而不重建观测,在某些潜伏状态空间中学习一个动态模型。我们尤其侧重于一种直观的、成本驱动的国家代表学习方法,以解决Linear Quadratic Gossian(LQG)控制,这是最基本的部分观察的控制问题之一。作为我们的主要结果,我们建立了利用直接学习的潜伏模型找到近最佳国家代表功能和近最佳控制器的有限抽样保证。尽管在这项工作之前取得了各种经验性的成功,但对于我们的最佳知识,我们尚不清楚这样一个成本驱动的潜在学习者是否享有有限样本的保证。我们的工作强调了预测多步成本的价值,这是我们理论的关键,特别是一个已知对学习国家代表具有经验价值的想法。