In this paper, we present a brief survey of Reinforcement Learning (RL), with particular emphasis on Stochastic Approximation (SA) as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Stochastic Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
翻译:本文简要介绍了强化学习(RL)的调查,特别强调随机近似(SA)作为统一主题。文章的范围包括马尔可夫奖励过程、马尔可夫决策过程、随机逼近算法以及广泛使用的算法,如时间差异学习和 $Q$-learning。