强化学习是一种学习范式,它关注于如何学习控制一个系统,从而最大化表达一个长期目标的数值性能度量。强化学习与监督学习的区别在于,对于学习者的预测,只向学习者提供部分反馈。此外,预测还可能通过影响被控系统的未来状态而产生长期影响。因此,时间起着特殊的作用。强化学习的目标是开发高效的学习算法,以及了解算法的优点和局限性。强化学习具有广泛的实际应用价值,从人工智能到运筹学或控制工程等领域。在这本书中,我们重点关注那些基于强大的动态规划理论的强化学习算法。我们给出了一个相当全面的学习问题目录,描述了核心思想,关注大量的最新算法,然后讨论了它们的理论性质和局限性。
Preface ix
Acknowledgments xiii
Markov Decision Processes 1
Preliminaries 1
Markov Decision Processes 1
Value functions 6
Dynamic programming algorithms for solving MDPs 10
Value Prediction Problems 11
TD(lambda) with function approximation 22
Gradient temporal difference learning 25
Least-squares methods 27
The choice of the function space 33
Tabular TD(0) 11
Every-visit Monte-Carlo 14
TD(lambda): Unifying Monte-Carlo and TD(0) 16
Temporal difference learning in finite state spaces 11
Algorithms for large state spaces 18
Control 37
Implementing a critic 54
Implementing an actor 56
Q-learning in finite MDPs 47
Q-learning with function approximation 49
Online learning in bandits 38
Active learning in bandits 40
Active learning in Markov Decision Processes 41
Online learning in Markov Decision Processes 42
A catalog of learning problems 37
Closed-loop interactive learning 38
Direct methods 47
Actor-critic methods 52
For Further Exploration 63
Further reading 63
Applications 63
Software 64
Appendix: The Theory of Discounted Markovian Decision Processes 65
A.1 Contractions and Banach’s fixed-point theorem 65
A.2 Application to MDPs 69
Bibliography 73
Author's Biography 89
https://sites.ualberta.ca/~szepesva/rlbook.html
专知便捷查看
便捷下载,请关注专知公众号(点击上方蓝色专知关注)
后台回复“A98” 可以获取《【干货书】强化学习算法,98页pdf综合讲解人工智能和机器学习》专知下载链接索引