课程主页深度强化学习实验室报道
来源:https://deeppavlov.ai/
编辑: DeepRL
RL#1: 13.02.2020: Exploration in RL
Sergey Ivanov
Random Network Distillation [1]
Intrinsic Curiosity Module [2,3]
Episodic Curiosity through Reachability [4]
Just Heuristic
Imitation Learning[5]
Inverse RL [6,7]
Learning from Human Preferences [8]
Petr Kuderov
A framework for temporal abstraction in RL [9]
The Option-Critic Architecture [10]
FeUdal Networks for Hierarchical RL [11]
Data-Efficient Hierarchical RL [12]
Meta Learning Shared Hierarchies [13]
Evgenia Elistratova
A framework for temporal abstraction in reinforcement learning [14]
Improving Exploration in Evolution Strategies for Deep RL [15]
Paired Open-Ended Trailblazer (POET) [16]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]
Pavel Shvechikov
A Distributional Perspective on RL [18]
Distributional RL with Quantile Regression [19]
Implicit Quantile Networks for Distributional RL [20]
Fully Parameterized Quantile Function for Distributional RL [21]
Taras Khakhulin
RL for Solving the Vehicle Routing Problem [22]
Attention, Learn to Solve Routing Problems! [23]
Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
Learning Combinatorial Optimization Algorithms over Graphs [25]
Pavel Termichev
RL and Control as Probabilistic Inference: Tutorial and Review [26]
RL with Deep Energy-Based Policies [27]
Soft Actor-Critic [28]
Variational Bayesian RL with Regret Bounds [29]
Sergey Sviridov
Stabilising Experience Replay for Deep Multi-Agent RL [30]
Counterfactual Multi-Agent Policy Gradients [31]
Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]
Evgeny Kashin
DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
World Models [37]
Model-Based RL for Atari [38]
Learning Latent Dynamics for Planning from Pixels [39]
Aleksandr Panin
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]
Dmitry Nikulin
Universal Value Function Approximators [45]
Hindsight Experience Replay [46]
PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
Progressive Neural Networks [48]
Learning an Embedding Space for Transferable Robot Skills [49]
Artyom Sorokin
Recurrent Experience Replay in Distributed RL [50]
AMRL: Aggregated Memory For RL [51]
Unsupervised Predictive Memory in a Goal-Directed Agent [52]
Stabilizing Transformers for RL [53]
Model-Free Episodic Control [54]
Neural Episodic Control [55]
Sergey Kolesnikov
Asynchronous Methods for Deep RL [56]
IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
Distributed Prioritized Experience Replay [58]
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]
第二部分:项目
【1】Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Hierarchical RL)
Implement the paper on the test environment of your choice.
【2】 HIRO with Hindsight Experience Replay (Hierarchical RL)
Add Hindsight experience replay to the HIRO algorithm.Compare with HIRO.
【3】 Meta Learning Shared Hierarchies on pytorch (Hierarchical RL)
Implement the paper with pytorch (author's implementation uses TF). Check its results on the test environment of your choice (not from the paper).
【4】Fast deep Reinforcement learning using online adjustments from the past (Memory in RL)
Try to reproduce the paper or implement the algorithm on a different environment.
Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.
【5】Episodic Reinforcement Learning with Associative Memory (Memory in RL)
Try to reproduce the paper or implement the algorithm on a different environment.
Bonus points:
* Comparison with the NEC or a basic DRL algorithm;
* Ablation study.
【6】Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Inverse RL)
Implement the algorithm and test it on Atari games. Compare results with common baselines.
【7】Non-Monotonic Sequential Text Generation on TF/chainer (Imitation Learning)
Implement the paper on tensorflow or chainer.
【8】Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Evolution Strategies)
Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.
【9】Improving Exploration in Evolution Strategies for DRL via a Population of Novelty-Seeking Agents (Evolution Strategies)
Implement the algorithm and test it on vizdoom or gym-minigrid. Compare resluts with available baselines.
【10】Comparative study of intrinsic motivations (Exploration in RL)
Using MountainCar-v0 compare:
1) curiosity on forward dynamics model loss;
2) curiosity on inverse dynamics model loss;
3) ICM;
4) RND.
Bonus points:
* Add motivation for off-policy RL algorithm (e.g. DQN or QR-DQN);
* Try MountainCarContinuous-v0.
【11】Solving Unity Pyramids (Exploration in RL)
Try to reproduce this experiment using any intrinsic motivation you like.
【12】RND Exploratory Behavior (Exploration in RL)
There was a study of exploratory behaviors for curiosity-based intrinsic motivation. Choose any environment, e.g. some Atari game, and discover exploratory behavior of RND.
【13】 Learning Improvement Heuristics for Solving the Travelling Salesman Problem (RL for Combinatorial Opt.)
Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.
【14】Dynamic Attention Model for Vehicle Routing Problems (RL for Combinatorial Opt.)
Implement the paper on any combinatorial opt. problem you like. Compare with avialable solvers.
【15】Variational RL with Regret Bounds (Variational RL)
Try to reproduce K-learning algorithm from the paper. Pick a finite discrete environment of your choice. Use this paper as an addition to the main one.
Bonus points:
* Compare with exact version of soft actor-critic or soft q-learning from here. Hint: use message-passing algorithm;
* Propose approximate K-learning algorithm with the use of function approximators (neural networks).
第三部分:课程资源
完
总结3: 《强化学习导论》代码/习题答案大全
总结6: 万字总结 || 强化学习之路
完
第70篇:DeepMind发布"离线强化学习基准“
第66篇:分布式强化学习框架Acme,并行性加强
第65篇:DQN系列(3): 优先级经验回放(PER)
第64篇:UC Berkeley开源RAD来改进强化学习算法
第61篇:David Sliver 亲自讲解AlphaGo、Zero
第59篇:Agent57在所有经典Atari 游戏中吊打人类
第58篇:清华开源「天授」强化学习平台
第57篇:Google发布"强化学习"框架"SEED RL"
第53篇:TRPO/PPO提出者John Schulman谈科研
第52篇:《强化学习》可复现性和稳健性,如何解决?
第51篇:强化学习和最优控制的《十个关键点》
第50篇:微软全球深度强化学习开源项目开放申请
第49篇:DeepMind发布强化学习库 RLax
第48篇:AlphaStar过程详解笔记
第47篇:Exploration-Exploitation难题解决方法
第45篇:DQN系列(1): Double Q-learning
第44篇:科研界最全工具汇总
第42篇:深度强化学习入门到精通资料综述
第41篇:顶会征稿 || ICAPS2020: DeepRL
第40篇:实习生招聘 || 华为诺亚方舟实验室
第39篇:滴滴实习生|| 深度强化学习方向
第37篇:Call For Papers# IJCNN2020-DeepRL
第36篇:复现"深度强化学习"论文的经验之谈
第35篇:α-Rank算法之DeepMind及Huawei改进
第34篇:从Paper到Coding, DRL挑战34类游戏
第31篇:强化学习,路在何方?
第30篇:强化学习的三种范例
第29篇:框架ES-MAML:进化策略的元学习方法
第28篇:138页“策略优化”PPT--Pieter Abbeel
第27篇:迁移学习在强化学习中的应用及最新进展
第26篇:深入理解Hindsight Experience Replay
第25篇:10项【深度强化学习】赛事汇总
第24篇:DRL实验中到底需要多少个随机种子?
第23篇:142页"ICML会议"强化学习笔记
第22篇:通过深度强化学习实现通用量子控制
第21篇:《深度强化学习》面试题汇总
第20篇:《深度强化学习》招聘汇总(13家企业)
第19篇:解决反馈稀疏问题之HER原理与代码实现
第17篇:AI Paper | 几个实用工具推荐
第16篇:AI领域:如何做优秀研究并写高水平论文?
第14期论文: 2020-02-10(8篇)
第13期论文:2020-1-21(共7篇)
第12期论文:2020-1-10(Pieter Abbeel一篇,共6篇)
第11期论文:2019-12-19(3篇,一篇OpennAI)
第10期论文:2019-12-13(8篇)
第9期论文:2019-12-3(3篇)
第8期论文:2019-11-18(5篇)
第7期论文:2019-11-15(6篇)
第6期论文:2019-11-08(2篇)
第5期论文:2019-11-07(5篇,一篇DeepMind发表)
第4期论文:2019-11-05(4篇)
第3期论文:2019-11-04(6篇)
第2期论文:2019-11-03(3篇)
第1期论文:2019-11-02(5篇)