The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.
翻译:对政策评价流行的LSPE($\lambda$)算法进行了重新审视,以获得从一段时间起提供高概率性能保证的集中约束。