【高级强化学习课程+项目】Advanced Topics in Deep Reinforcement learning开课啦！

2020 年 7 月 1 日 深度强化学习实验室

课程主页深度强化学习实验室报道

来源：https://deeppavlov.ai/

编辑: DeepRL

本课程重点介绍深度强化学习近年来的最新研究进展，涉及强化学习中探索策略，模仿和反向强化学习，分层强化学习，强化学习中的进化策略，分布式强化学习，强化学习组合优化，多智能体强化学习，大规模强化学习，多任务和迁移强化学习，强化学习中的记忆机制，值得大家研究。

第一部分：课程

RL#1: 13.02.2020： Exploration in RL

Sergey Ivanov

Random Network Distillation [1]
Intrinsic Curiosity Module [2,3]
Episodic Curiosity through Reachability [4]

RL#2: 20.02.2020： Imitation and Inverse RL

Just Heuristic

Imitation Learning[5]
Inverse RL [6,7]
Learning from Human Preferences [8]

RL#3: 27.02.2020： Hierarchical Reinforcement Learning

Petr Kuderov

A framework for temporal abstraction in RL [9]
The Option-Critic Architecture [10]
FeUdal Networks for Hierarchical RL [11]
Data-Efficient Hierarchical RL [12]
Meta Learning Shared Hierarchies [13]

RL#4: 5.03.2020： Evolutionary Strategies in RL

Evgenia Elistratova

A framework for temporal abstraction in reinforcement learning [14]
Improving Exploration in Evolution Strategies for Deep RL [15]
Paired Open-Ended Trailblazer (POET) [16]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

RL#5: 12.03.2020： Distributional Reinforcement Learning

Pavel Shvechikov

A Distributional Perspective on RL [18]
Distributional RL with Quantile Regression [19]
Implicit Quantile Networks for Distributional RL [20]
Fully Parameterized Quantile Function for Distributional RL [21]

RL#6: 19.03.2020： RL for Combinatorial optimization

Taras Khakhulin

RL for Solving the Vehicle Routing Problem [22]
Attention, Learn to Solve Routing Problems! [23]
Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
Learning Combinatorial Optimization Algorithms over Graphs [25]

RL#7: 26.03.2020： RL as Probabilistic Inference

Pavel Termichev

RL and Control as Probabilistic Inference: Tutorial and Review [26]
RL with Deep Energy-Based Policies [27]
Soft Actor-Critic [28]
Variational Bayesian RL with Regret Bounds [29]

RL#8: 9.04.2020： Multi Agent Reinforcement Learning

Sergey Sviridov

Stabilising Experience Replay for Deep Multi-Agent RL [30]
Counterfactual Multi-Agent Policy Gradients [31]
Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

RL#9: 16.04.2020： Model-Based Reinforcement Learning

Evgeny Kashin

DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
World Models [37]
Model-Based RL for Atari [38]
Learning Latent Dynamics for Planning from Pixels [39]

RL#10: 23.04.2020： Reinforcement Learning at Scale

Aleksandr Panin

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

RL#11: 30.04.2020： Multitask & Transfer RL

Dmitry Nikulin

Universal Value Function Approximators [45]
Hindsight Experience Replay [46]
PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
Progressive Neural Networks [48]
Learning an Embedding Space for Transferable Robot Skills [49]

RL#12: 07.05.2020： Memory in Reinforcement Learning

Artyom Sorokin

Recurrent Experience Replay in Distributed RL [50]
AMRL: Aggregated Memory For RL [51]
Unsupervised Predictive Memory in a Goal-Directed Agent [52]
Stabilizing Transformers for RL [53]
Model-Free Episodic Control [54]
Neural Episodic Control [55]

RL#13: 14.05.2020： Distributed RL In the wild

Sergey Kolesnikov

Asynchronous Methods for Deep RL [56]
IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
Distributed Prioritized Experience Replay [58]
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]

第二部分：项目

【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)

Implement the paper on the test environment of your choice.

【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)

Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.

【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)

Implement the paper with pytorch (author's implementation uses TF). Check its results on the test environment of your choice (not from the paper).

【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【5】Episodic Reinforcement Learning with Associative Memory (Memory in RL)

Try to reproduce the paper or implement the algorithm on a different environment.

Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.

【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)

Implement the algorithm and test it on Atari games. Compare results with common baselines.

【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)

Implement the paper on tensorflow or chainer.

【8】Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)

Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.

【10】Comparative study of intrinsic motivations (Exploration in RL)

Using MountainCar-v0 compare:

1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0.

【11】Solving Unity Pyramids (Exploration in RL)

Try to reproduce this experiment using any intrinsic motivation you like.

【12】RND Exploratory Behavior (Exploration in RL)

There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.

【13】 Learning Improvement Heuristics for Solving the Travelling Salesman Problem (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)

Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.

【15】Variational RL with Regret Bounds (Variational RL)

Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.

Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks).

第三部分：课程资源

课程主页：https://deeppavlov.ai/rl_course_2020

Bilibili: https://www.bilibili.com/video/av668428103/

Youtube:

https://www.youtube.com/playlist?list=PLt1IfGj6-_-eXjZDFBfnAhAJmCyX227ir

交流请加微信 NeuronDance 务必注明【姓名-学校/单位-研究方向】,否则不通过

完

总结1：周志华 || AI领域如何做研究-写高水平论文

总结2：全网首发最全深度强化学习资料(永更)

总结3: 《强化学习导论》代码/习题答案大全

总结4：30+个必知的《人工智能》会议清单

总结5：2019年-57篇深度强化学习文章汇总