We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^\star$ or (ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or (iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (2021b) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (2021a) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=\Omega(\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).
翻译:我们考虑的是在线规划的微质量质询复杂性, 包括固定正数 Markov 决策程序( MDPs ) 的基因模型。 在最近的工作之后, 我们考虑的问题范围很广, 其中( 一) 最佳值函数 $v ⁇ star$ 或 (二) 最佳行动价值函数 $q ⁇ star$ 位于某些特性的线性范围; 或 (三) 美元=star$ 和 $q ⁇ star$ 存在于线性范围内, 仅限于起始状态可以达到的国家。 最近, Weisz 等人( 2021b) 显示, 在 (二) 美元下, 任何规划的微量查询复杂度在地平线上至少是 $++% zarstar, 或者在特性尺寸上是 $1美元, 美元 。 在动作的直径直线上, (一) 维兹等人( 2021) 引入Tensor Plan, 其所有查询成本在所有相关数量中可能是多元的 。 在磁盘变变变数时, 或变变数中, 这些变数在硬的答案中, 。