会员服务 ·

spinningup.openai 强化学习资源完整

2018 年 12 月 17 日 CreateAMind

Welcome to Spinning Up in Deep RL!

User Documentation

Introduction

What This Is
Why We Built This
How This Serves Our Mission
Code Design Philosophy
Support Plan

Installation

Installing Python
Installing OpenMPI
Installing Spinning Up
Check Your Install
Installing MuJoCo (Optional)

Algorithms

What’s Included
Why These Algorithms?
Code Format

Running Experiments

Launching from the Command Line
Launching from Scripts

Experiment Outputs

Algorithm Outputs
Save Directory Location
Loading and Running Trained Policies

Plotting Results

Introduction to RL

Part 1: Key Concepts in RL

What Can RL Do?
Key Concepts and Terminology
(Optional) Formalism

Part 2: Kinds of RL Algorithms

A Taxonomy of RL Algorithms
Links to Algorithms in Taxonomy

Part 3: Intro to Policy Optimization

Deriving the Simplest Policy Gradient
Implementing the Simplest Policy Gradient
Expected Grad-Log-Prob Lemma
Don’t Let the Past Distract You
Implementing Reward-to-Go Policy Gradient
Baselines in Policy Gradients
Other Forms of the Policy Gradient
Recap

Resources

Spinning Up as a Deep RL Researcher

The Right Background
Learn by Doing
Developing a Research Project
Doing Rigorous Research in RL
Closing Thoughts
PS: Other Resources
References

Key Papers in Deep RL

1. Model-Free RL
2. Exploration
3. Transfer and Multitask RL
4. Hierarchy
5. Memory
6. Model-Based RL
7. Meta-RL
8. Scaling RL
9. RL in the Real World
10. Safety
11. Imitation Learning and Inverse Reinforcement Learning
12. Reproducibility, Analysis, and Critique
13. Bonus: Classic Papers in RL Theory or Review

Exercises

Problem Set 1: Basics of Implementation
Problem Set 2: Algorithm Failure Modes
Challenges

Benchmarks for Spinning Up Implementations

Performance in Each Environment
Experiment Details

Algorithms Docs

Vanilla Policy Gradient

Background
Documentation
References

Trust Region Policy Optimization

Background
Documentation
References

Proximal Policy Optimization

Background
Documentation
References

Deep Deterministic Policy Gradient

Background
Documentation
References

Twin Delayed DDPG

Background
Documentation
References

Soft Actor-Critic

Background
Documentation
References

Utilities Docs

Logger

Using a Logger
Logger Classes
Loading Saved Graphs

Plotter
MPI Tools

Core MPI Utilities
MPI + Tensorflow Utilities

Run Utils

ExperimentGrid
Calling Experiments

Etc.

Acknowledgements
About the Author

Indices and tables

Index
Module Index
Search Page

登录查看更多

相关内容

关注 0

多媒体系统（MS）期刊详细介绍了多媒体计算，通信，存储和应用的各个方面的创新研究思想，新兴技术，最新方法和工具。它包含理论，实验和调查文章。多媒体系统的覆盖范围包括：在计算机系统中集成数字视频和音频功能；多媒体信息编码和数据交换格式；数字多媒体的操作系统机制；数字视频和音频网络与通信；存储模型和结构；用于支持多媒体应用程序的方法、范式、工具和软件体系结构；多媒体应用程序和应用程序接口，以及多媒体终端系统架构。官网地址：http://dblp.uni-trier.de/db/journals/mms/

最新《经济学中的强化学习》2020大综述，42页pdf128篇文献

专知会员服务

120+阅读 · 2020年4月6日

强化学习和最优控制的《十个关键点》81页PPT汇总

专知会员服务

107+阅读 · 2020年3月2日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日