We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is model-free (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is model-based and minimax optimal even with zero-cost state-action pairs, matching the best existing result from [Tarbouriech et al., 2021b]. Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameter-free.
翻译:我们在Stochastic Sortest Path (SSP) 模型中引入了一个通用模板,用于开发最遗憾最小化算法,只要某些属性得到保证,即可实现最微小的遗憾。我们分析的关键是一个名为隐含的有限正正数近似的新技术,该技术在分析中仅通过一个有限正数对应方在不明显执行的情况下接近 SSP 模型。使用这个模板,我们开发了两种新的算法:第一个算法是没有模型的(我们所了解的文献中的第一个),而最优则在严格肯定的成本下;第二个算法是基于模型的,最优化的最小正数,甚至与零成本的州-行动对配对,匹配[Tarbouriech等人,2021b]的现有最佳结果。 重要的是,两种算法都接受高度稀少的更新,使其计算效率高于所有现有的算法。此外,这两种算法都可以完全无参数。