强化学习从入门到放弃：汪军李宏毅等大佬资源大放送！

会员服务 ·

强化学习从入门到放弃：汪军李宏毅等大佬资源大放送！

2018 年 12 月 29 日 新智元

新智元报道

来源：GitHub

编辑：三石

【新智元导读】本文整理了从入门级到高级的强化学习资料，包括书籍和课程，包括李宏毅、汪军等大牛的宝贵资料。望读者能从中受益。

一、书

[Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )
[Algorithms for Reinforcement Learning](#Algorithms for Reinforcement Learning)
OpenAI-spinningup

二、课程

1、基础课程

[Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程（Alberta）)
[David Silver 强化学习课程（UCL）](#David Silver 强化学习课程（UCL）)
[Stanford 强化学习课程](#Stanford 强化学习课程)
[UCL + STJU Multi-Agent Reinforcement Learning Tutorial](#Multi-Agent Reinforcement Learning Tutorial)

2、深度DRL课程

[台湾大学李宏毅（深度）强化学习](#台湾大学李宏毅（深度）强化学习)
[UCB 深度强化学习课程](#UCB 深度强化学习课程)
[CMU 深度强化学习课程](#CMU 深度强化学习课程)

书

Reinforcement Learning: An Introduction

Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction update 第二版的最终版（点击obline draft）�： link，因为官方的是放在google doc上，所以我就下载了一个放在github上，需要自取。

注：已经可以准备买实体书了，和同学各自海淘了一本，还没有到手 -- 国外亚马逊，国内的话，可以考虑JD和国内的亚马逊--不过会贵一些。

Algorithms for Reinforcement Learning

Csaba Szepesvari, Algorithms for Reinforcement Learning

链接：https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf

OpenAI-spinningup

这个算是比较杂的书吧，有在线doc+对应的code+对应的练习（非常建议结合UCL的一起看，我大致过了一遍，蛮不错的。 *

但是没有提到下面的UCL，UCB的课，也没有提到上面sutton的书，结合得看或许会更好：

在线的文档：

http://spinningup.openai.com/en/latest/

关于强化学习的基础介绍：http://spinningup.openai.com/en/latest/spinningup/rl_intro.html

关于深度强化学习的建议：http://spinningup.openai.com/en/latest/spinningup/spinningup.html

代码部分：

https://github.com/openai/spinningup/tree/master/spinup

课程

基础课程

课程主页

链接：http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html

这个比较老了，有一个比较新的在google云盘上，我找个时间整理一下。

David Silver 强化学习课程（UCL）

注：这是David Silver大神2015在UCL开的课，现在感觉已经在DeepMind走向巅峰了，估计得等他那天想回学校培养学生才可能开出新的课吧。非常推荐入门学习，建立基础的RL概念。

课程主页：http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

对应slide（课件）：

Lecture 1: Introduction to Reinforcement Learning

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf

Lecture 2: Markov Decision Processes

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf

Lecture 3: Planning by Dynamic Programming

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf

Lecture 4: Model-Free Prediction link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf

Lecture 5: Model-Free Control link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf

Lecture 6: Value Function Approximation link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf

Lecture 7: Policy Gradient Methods link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf

Lecture 8: Integrating Learning and Planning link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/dyna.pdf

Lecture 9: Exploration and Exploitation link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/XX.pdf

Lecture 10: Case Study: RL in Classic Games link

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/games.pdf

Stanford 强化学习课程

注：为2018 spring的课

课程主页：http://web.stanford.edu/class/cs234/schedule.html

对应slide（课件）：

Introduction to Reinforcement Learning

http://web.stanford.edu/class/cs234/slides/cs234_2018_l1.pdf

How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration

http://web.stanford.edu/class/cs234/slides/cs234_2018_l2.pdf

Learning to evaluate a policy when don't know how the world works.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l3.pdf

Model-free learning to make good decisions. Q-learning. SARSA.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l4.pdf

Scaling up: value function approximation. Deep Q Learning.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l5.pdf

Deep reinforcement learning continued.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l6.pdf

Imitation Learning.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l7_annotated.pdf

Policy search.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l8.pdf

Policy search.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l9_updated.pdf

Midterm review.

http://web.stanford.edu/class/cs234/slides/cs234_2018_midterm_review.pdf

Fast reinforcement learning (Exploration/Exploitation) Part I.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l11.pdf

Fast reinforcement learning (Exploration/Exploitation) Part II.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l12.pdf

Batch Reinforcement Learning.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l13.pdf

Monte Carlo Tree Search.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l14.pdf

Human in the loop RL with a focus on transfer learing.

http://web.stanford.edu/class/cs234/slides/cs234_2018_l15.pdf

Multi-Agent Reinforcement Learning Tutorial

注：因为在阿里广告这边实习，有幸和汪老师还有张老师做了篇论文。在过程中体会到汪老师的思维真的很活跃，很强。另外，张老师感觉是国内cs冉冉升起的新星，值得follow和关注！

课程主页：

http://wnzhang.net/tutorials/marl2018/index.html

Fundamentals of Reinforcement Learning

http://wnzhang.net/tutorials/marl2018/docs/lecture-1-rl.pdf

Fundamentals of Game Theory

http://wnzhang.net/tutorials/marl2018/docs/lecture-2a-game-theory.pdf

Learning in Repeated Games

http://wnzhang.net/tutorials/marl2018/docs/lecture-2b-repeated-games.pdf

Multi-Agent Reinforcement Learning

http://wnzhang.net/tutorials/marl2018/docs/lecture-3a-marl-1.pdf

深度DRL课程

台湾大学李宏毅（深度）强化学习

课程主页：

http://speech. ee.ntu.edu.tw/~tlkagk/courses/

视频可以在B站上看到：

https://www.bilibili.com/video/av24724071?from=search&seid=14814651069494196110

UCB 深度强化学习课程

课程主页：

http://rail.eecs.berkeley.edu/deeprlcourse/

Lecture Slides See Syllabus for more information.

细节部分详情请见原文中的链接：

https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6

CMU 深度强化学习课程

update fall 2018

2018 fall 的课程主页：

http://www.andrew.cmu.edu/course/10-703/

2017的课程主页：

https://katefvision.github.io/

对应slide（课件）：

Introduction

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture1_intro.pdf

Markov decision processes (MDPs), POMDPs

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture2_mdps.pdf

Solving known MDPs: Dynamic Programming

http://www.andrew.cmu.edu/course/10-703/slides/lecture3_exactmethods-9-5-2018.pdf

Policy iteration, Value iteration, Asynchronous DP

http://www.andrew.cmu.edu/course/10-703/slides/lecture4_valuePolicyDP-9-10-2018.pdf

Monte Carlo Learning, Temporal difference learning, Q learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture5_MC_9-12-2018.pdf

Temporal difference learning (Tom), Planning and learning: Dyna, Monte carlo tree search

http://www.andrew.cmu.edu/course/10-703/slides/TDshort-9-17-2018.pdf

Deep NN Architectures for RL

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_NNarchitecturesforRL_katef.pdf

Recitation on Monte Carlo Tree Search

https://www.cs.cmu.edu/~katef/DeepRLFall2018/MCTS_katef.pdf

VF approximation, MC, TD with VF approximation, Control with VF approximation

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_FAkatef.pdf

Deep Q Learning : Double Q learning, replay memory

https://www.cs.cmu.edu/~katef/DeepRLFall2018/lecture_DQL_katef2018.pdf

Advanced Policy Gradients

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_PG-NatGrad-10-8-2018.pdf

Evolution Methods, Natural Gradients

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_async_evolution.pdf

Natural Policy Gradients, TRPO, PPO, ACKTR

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_NaturalPolicyGradientsTRPOPPO.pdf

Pathwise Derivatives, DDPG, multigoal RL, HER

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_DDPGMultigoalRL.pdf

Exploration vs. Exploitation

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Exploration-10-22-2018.pdf

Exploration and RL in Animals

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_exploration.pdf

Model-based Reinforcement Learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_modelbasedRL.pdf

Imitation Learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_Imitation_supervised-Nov-5-2018.pdf

Maximum Entropy Inverse RL, Adversarial imitation learning

http://www.andrew.cmu.edu/course/10-703/slides/Lecture_IRL_GAIL.pdf

Recitation: Trajectory optimization - iterative LQR

https://katefvision.github.io/katefSlides/RECITATIONtrajectoryoptimization_katef.pdf

原文链接：

https://github.com/wwxFromTju/awesome-reinforcement-learning-zh#%E4%B9%A6

更多阅读：

【加入社群】

新智元 AI 技术 + 产业社群招募中，欢迎对 AI 技术 + 产业落地感兴趣的同学，加小助手微信号：aiera2015_2 入群；通过审核后我们将邀请进群，加入社群后务必修改群备注（姓名 - 公司 - 职位；专业群审核较严，敬请谅解）。

登录查看更多

知识荟萃

精品入门和进阶教程、论文和代码整理等

查看相关VIP内容、论文、资讯等

【圣经书】《强化学习导论(2nd)》电子书与代码，548页pdf

专知会员服务

208+阅读 · 2020年5月22日

普林斯顿大学陈丹琦主讲2020课程《深度学习自然语言处理》课程，21讲带你学习NLP最新技术

专知会员服务

154+阅读 · 2020年3月29日

简明扼要！Python教程手册，206页pdf

专知会员服务

48+阅读 · 2020年3月24日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

《DeepMind深度学习与强化学习进阶》850页ppt课件与视频开放（附下载）

专知会员服务

147+阅读 · 2019年12月25日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

专知会员服务

44+阅读 · 2019年11月19日

【麻省理工学院课程】MIT 6.S094: Deep Learning for Self-Driving Cars，深度学习和自动驾驶课程

专知会员服务

52+阅读 · 2019年11月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

李宏毅-《机器学习/深度学习-2019》视频及资料分享

深度学习与NLP

42+阅读 · 2019年3月20日

强化学习资源列表，Updating...

机器学习算法与Python学习

14+阅读 · 2018年12月30日

548页MIT强化学习教程，收藏备用【PDF下载】

机器学习算法与Python学习

17+阅读 · 2018年10月11日

资源 | 台大李宏毅教授最新课程，深度强化学习有国语版啦！

大数据文摘

20+阅读 · 2018年6月13日

干货 | 机器学习怎么从入门到不放弃！

THU数据派

6+阅读 · 2018年6月8日

资源 | UC Berkeley CS 294深度强化学习课程（附视频、学习资料）

数据派THU

21+阅读 · 2018年4月7日

春节充电系列：李宏毅2017机器学习课程学习笔记31之深度强化学习(deep reinforcement learning)

专知

3+阅读 · 2018年3月21日

送你一份深度学习资源&教程！

THU数据派

13+阅读 · 2017年11月30日

Deep Reinforcement Learning 深度增强学习资源

数据挖掘入门与实战

7+阅读 · 2017年11月4日

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Knowledge Graphs

Arxiv

102+阅读 · 2020年3月4日

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning

Arxiv

5+阅读 · 2019年3月25日

gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo

Arxiv

7+阅读 · 2019年3月14日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

Notes on Deep Learning for NLP

Arxiv

22+阅读 · 2018年8月30日

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Arxiv

3+阅读 · 2018年7月12日

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员