We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a certain computation oracle.
翻译:我们建议一种算法,用线性函数近似值(LFA)来测量最短路径(SSP ) 。 在最低假设下,它获得亚线性遗憾,具有计算效率,并使用固定政策。 据我们所知,这是LFA文献(SSP 或其他配方)中的第一个这样的算法。 我们的算法是一个比较普通的特例,在进入某种计算器的次数中取得了遗憾的平方根。