来自伯克利Pieter Abbeel教授讲述的深度强化学习课程6讲,讲述内容包括,MDP basics, value & policy iteration, max-ent, DQN, policy gradient, TRPO, PPO, DDPG, SAC, model-based RL.
视频地址: https://www.youtube.com/playlist?list=PLwRJQ4m4UJjNymuBM9RdmB3Z9N5-0IlY0