Introduction
What This Is
Why We Built This
How This Serves Our Mission
Code Design Philosophy
Support Plan
Installation
Installing Python
Installing OpenMPI
Installing Spinning Up
Check Your Install
Installing MuJoCo (Optional)
Algorithms
What’s Included
Why These Algorithms?
Code Format
Running Experiments
Launching from the Command Line
Launching from Scripts
Experiment Outputs
Algorithm Outputs
Save Directory Location
Loading and Running Trained Policies
Plotting Results
Part 1: Key Concepts in RL
What Can RL Do?
Key Concepts and Terminology
(Optional) Formalism
Part 2: Kinds of RL Algorithms
A Taxonomy of RL Algorithms
Links to Algorithms in Taxonomy
Part 3: Intro to Policy Optimization
Deriving the Simplest Policy Gradient
Implementing the Simplest Policy Gradient
Expected Grad-Log-Prob Lemma
Don’t Let the Past Distract You
Implementing Reward-to-Go Policy Gradient
Baselines in Policy Gradients
Other Forms of the Policy Gradient
Recap
Spinning Up as a Deep RL Researcher
The Right Background
Learn by Doing
Developing a Research Project
Doing Rigorous Research in RL
Closing Thoughts
PS: Other Resources
References
Key Papers in Deep RL
1. Model-Free RL
2. Exploration
3. Transfer and Multitask RL
4. Hierarchy
5. Memory
6. Model-Based RL
7. Meta-RL
8. Scaling RL
9. RL in the Real World
10. Safety
11. Imitation Learning and Inverse Reinforcement Learning
12. Reproducibility, Analysis, and Critique
13. Bonus: Classic Papers in RL Theory or Review
Exercises
Problem Set 1: Basics of Implementation
Problem Set 2: Algorithm Failure Modes
Challenges
Benchmarks for Spinning Up Implementations
Performance in Each Environment
Experiment Details
Vanilla Policy Gradient
Background
Documentation
References
Trust Region Policy Optimization
Background
Documentation
References
Proximal Policy Optimization
Background
Documentation
References
Deep Deterministic Policy Gradient
Background
Documentation
References
Twin Delayed DDPG
Background
Documentation
References
Soft Actor-Critic
Background
Documentation
References
Logger
Using a Logger
Logger Classes
Loading Saved Graphs
Plotter
MPI Tools
Core MPI Utilities
MPI + Tensorflow Utilities
Run Utils
ExperimentGrid
Calling Experiments
Acknowledgements
About the Author
Index
Module Index
Search Page