Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear quadratic regulator problem. We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest. Motivated by our lower bound, we propose a computationally efficient experiment-design based exploration algorithm. We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity, up to constant factors. Through several examples of the LQR problem, we show that performing task-guided exploration provably improves on exploration schemes which do not take into account the task of interest. Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal. We conclude with several experiments illustrating the effectiveness of our approach in practice.
翻译:探索未知环境是强化学习和控制的根本问题。 在这项工作中,我们研究任务引导的探索,并确定代理人必须确切了解环境以完成某项特定任务的确切内容。在形式上,我们研究线形动态系统设置过程中的一系列广泛的决策问题,包括线形二次调节器问题。我们提供实例和任务依赖的下层界限,明确量化完成一项感兴趣任务的困难。受我们较低约束的驱动,我们提议一种基于探索算法的高效实验设计计算法。我们表明,它最妥善地探索环境,收集完成这项任务所需的准确信息,提供期限界限,保证它达到试样和任务最佳样本的复杂性,直至不变因素。我们通过LQR问题的几个例子,表明,进行任务引导的探索可以明显改善不考虑兴趣任务的勘探计划。我们确定,确定确定确定确定等同性的决定是实例和任务优化的,并获得线形二次调节法方法的第一个算法,我们通过若干例子来说明我们线形二次二次调节器的实验的有效性问题。