新智元报道
目录
一、书
[Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )
[Algorithms for Reinforcement Learning](#Algorithms for Reinforcement Learning)
OpenAI-spinningup
二、课程
1、基础课程
[Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))
[David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))
[Stanford 强化学习课程](#Stanford 强化学习课程)
[UCL + STJU Multi-Agent Reinforcement Learning Tutorial](#Multi-Agent Reinforcement Learning Tutorial)
2、深度DRL课程
[台湾大学 李宏毅 (深度)强化学习](#台湾大学 李宏毅 (深度)强化学习)
[UCB 深度强化学习课程](#UCB 深度强化学习课程)
[CMU 深度强化学习课程](#CMU 深度强化学习课程)
Reinforcement Learning: An Introduction
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction update 第二版的最终版(点击obline draft)�: link,因为官方的是放在google doc上,所以我就下载了一个放在github上,需要自取。
注:已经可以准备买实体书了,和同学各自海淘了一本,还没有到手 -- 国外亚马逊, 国内的话,可以考虑JD和国内的亚马逊--不过会贵一些。
Algorithms for Reinforcement Learning
Csaba Szepesvari, Algorithms for Reinforcement Learning
链接:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
OpenAI-spinningup
这个算是比较杂的书吧,有在线doc+对应的code+对应的练习(非常建议结合UCL的一起看,我大致过了一遍,蛮不错的。 *
但是没有提到下面的UCL,UCB的课,也没有提到上面sutton的书,结合得看或许会更好:
在线的文档:
http://spinningup.openai.com/en/latest/
关于强化学习的基础介绍:http://spinningup.openai.com/en/latest/spinningup/rl_intro.html
关于深度强化学习的建议:http://spinningup.openai.com/en/latest/spinningup/spinningup.html
代码部分:
https://github.com/openai/spinningup/tree/master/spinup
基础课程
课程主页
链接:http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html
这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。
David Silver 强化学习课程(UCL)
注:这是David Silver大神2015在UCL开的课,现在感觉已经在DeepMind走向巅峰了,估计得等他那天想回学校培养学生才可能开出新的课吧。非常推荐入门学习,建立基础的RL概念。
课程主页:http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
对应slide(课件):
Lecture 1: Introduction to Reinforcement Learning
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf
Lecture 2: Markov Decision Processes
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf
Lecture 3: Planning by Dynamic Programming
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf
Lecture 4: Model-Free Prediction link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf
Lecture 5: Model-Free Control link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf
Lecture 6: Value Function Approximation link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
Lecture 7: Policy Gradient Methods link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf
Lecture 8: Integrating Learning and Planning link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/dyna.pdf
Lecture 9: Exploration and Exploitation link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/XX.pdf
Lecture 10: Case Study: RL in Classic Games link
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/games.pdf
Stanford 强化学习课程
注:为2018 spring的课
课程主页:http://web.stanford.edu/class/cs234/schedule.html
对应slide(课件):
Introduction to Reinforcement Learning
http://web.stanford.edu/class/cs234/slides/cs234_2018_l1.pdf
How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration
http://web.stanford.edu/class/cs234/slides/cs234_2018_l2.pdf
Learning to evaluate a policy when don't know how the world works.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l3.pdf
Model-free learning to make good decisions. Q-learning. SARSA.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l4.pdf
Scaling up: value function approximation. Deep Q Learning.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l5.pdf
Deep reinforcement learning continued.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l6.pdf
Imitation Learning.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l7_annotated.pdf
Policy search.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l8.pdf
Policy search.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l9_updated.pdf
Midterm review.
http://web.stanford.edu/class/cs234/slides/cs234_2018_midterm_review.pdf
Fast reinforcement learning (Exploration/Exploitation) Part I.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l11.pdf
Fast reinforcement learning (Exploration/Exploitation) Part II.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l12.pdf
Batch Reinforcement Learning.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l13.pdf
Monte Carlo Tree Search.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l14.pdf
Human in the loop RL with a focus on transfer learing.
http://web.stanford.edu/class/cs234/slides/cs234_2018_l15.pdf
Multi-Agent Reinforcement Learning Tutorial
注:因为在阿里广告这边实习,有幸和汪老师还有张老师做了篇论文。在过程中体会到汪老师的思维真的很活跃,很强。另外,张老师感觉是国内cs冉冉升起的新星,值得follow和关注!
课程主页:
http://wnzhang.net/tutorials/marl2018/index.html
Fundamentals of Reinforcement Learning
http://wnzhang.net/tutorials/marl2018/docs/lecture-1-rl.pdf
Fundamentals of Game Theory
http://wnzhang.net/tutorials/marl2018/docs/lecture-2a-game-theory.pdf
Learning in Repeated Games
http://wnzhang.net/tutorials/marl2018/docs/lecture-2b-repeated-games.pdf
Multi-Agent Reinforcement Learning
http://wnzhang.net/tutorials/marl2018/docs/lecture-3a-marl-1.pdf
深度DRL课程
台湾大学 李宏毅 (深度)强化学习
课程主页:
http://speech. ee.ntu.edu.tw/~tlkagk/courses/
视频可以在B站上看到:
https://www.bilibili.com/video/av24724071?from=search&seid=14814651069494196110
UCB 深度强化学习课程
课程主页:
http://rail.eecs.berkeley.edu/deeprlcourse/
Lecture Slides See Syllabus for more information.
细节部分详情请见原文中的链接:
https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6
CMU 深度强化学习课程
update fall 2018
2018 fall 的课程主页:
http://www.andrew.cmu.edu/course/10-703/
2017的课程主页:
https://katefvision.github.io/
对应slide(课件):
Introduction
https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture1_intro.pdf
Markov decision processes (MDPs), POMDPs
https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture2_mdps.pdf
Solving known MDPs: Dynamic Programming
http://www.andrew.cmu.edu/course/10-703/slides/lecture3_exactmethods-9-5-2018.pdf
Policy iteration, Value iteration, Asynchronous DP
http://www.andrew.cmu.edu/course/10-703/slides/lecture4_valuePolicyDP-9-10-2018.pdf
Monte Carlo Learning, Temporal difference learning, Q learning
http://www.andrew.cmu.edu/course/10-703/slides/Lecture5_MC_9-12-2018.pdf
Temporal difference learning (Tom), Planning and learning: Dyna, Monte carlo tree search
http://www.andrew.cmu.edu/course/10-703/slides/TDshort-9-17-2018.pdf
Deep NN Architectures for RL
https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_NNarchitecturesforRL_katef.pdf
Recitation on Monte Carlo Tree Search
https://www.cs.cmu.edu/~katef/DeepRLFall2018/MCTS_katef.pdf
VF approximation, MC, TD with VF approximation, Control with VF approximation
https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_FAkatef.pdf
Deep Q Learning : Double Q learning, replay memory
https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_DQL_katef2018.pdf
Advanced Policy Gradients
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_PG-NatGrad-10-8-2018.pdf
Evolution Methods, Natural Gradients
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_async_evolution.pdf
Natural Policy Gradients, TRPO, PPO, ACKTR
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Pathwise Derivatives, DDPG, multigoal RL, HER
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_DDPGMultigoalRL.pdf
Exploration vs. Exploitation
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Exploration-10-22-2018.pdf
Exploration and RL in Animals
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_exploration.pdf
Model-based Reinforcement Learning
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_modelbasedRL.pdf
Imitation Learning
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Imitation_supervised-Nov-5-2018.pdf
Maximum Entropy Inverse RL, Adversarial imitation learning
http://www.andrew.cmu.edu/course/10-703/slides/Lecture_IRL_GAIL.pdf
Recitation: Trajectory optimization - iterative LQR
https://katefvision.github.io/katefSlides/RECITATIONtrajectoryoptimization_katef.pdf
原文链接:
https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6
更多阅读:
【加入社群】
新智元 AI 技术 + 产业社群招募中,欢迎对 AI 技术 + 产业落地感兴趣的同学,加小助手微信号:aiera2015_2 入群;通过审核后我们将邀请进群,加入社群后务必修改群备注(姓名 - 公司 - 职位;专业群审核较严,敬请谅解)。