强化学习 - 专知主题

强化学习（RL）是机器学习的一个领域，与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外，强化学习是三种基本的机器学习范式之一。强化学习与监督学习的不同之处在于，不需要呈现带标签的输入/输出对，也不需要显式纠正次优动作。相反，重点是在探索（未知领域）和利用（当前知识）之间找到平衡。该环境通常以马尔可夫决策过程（MDP）的形式陈述，因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于，后者不假设MDP的确切数学模型，并且针对无法采用精确方法的大型MDP。

知识荟萃

强化学习 ( Reinforcement learning ) 专知荟萃

更新时间2022年3月2日

入门教程/课程

【强化学习科普入门】作者-廖光明
● https://insights.thoughtworks.cn/reinforcement-learning/
【强化学习入门教程】作者-周沫凡 Mofan Zhou
● 该课程较少涉及理论方面的知识，而侧重于强化学习算法的代码实现，注重实际应用
● 含讲解ppt、资料及讲解视频https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/
【深度强化学习入门教程】作者-李宏毅
● 是强化学习的入门课程，对于初学者来说比较友好，老师上课举的例子很形象，很有趣，对于理解相关的概念知识非常有帮助。课程以讲述理论知识为主，关于强化学习方面的实际应用以及代码实现较少，可以考虑完成该课程布置的作业，以加深对算法的理解。
● 含讲解PPT、视频：https://speech.ee.ntu.edu.tw/~hylee/mlds/2018-spring.php
● 课程视频：https://www.bilibili.com/video/av24724071
【OpenAI强化学习教程】量子位
● 教程简介https://zhuanlan.zhihu.com/p/49087870 ，对新手极度友好，代码简约易懂。从一套重要概念，到一系列关键算法实现代码，再到必读论文列表，最后到热身练习，每一步都以清晰简明为上，全程站在初学者视角。
● 教程在线https://spinningup.openai.com/en/latest/
【强化学习从入门到精通系列】 Ailin（公众号AI与强化学习）
● 作者讲解马尔科夫决策过程、动态规划、蒙特卡洛、时序差分及更多流行算法等内容，帮助大家从零开始理解强化学习的知识。
● https://mp.weixin.qq.com/s/BwaEAUbmeTrMyitZNHAdaQ

进阶教程/课程

《强化学习导论》经典课程10讲，DeepMind大神David Silver主讲
● 经典的10部分课程，虽然录制于2015年，但仍然是任何想要学习RL基础的同学所必需的资源。
● 专知链接（含课程PPT）https://www.zhuanzhi.ai/vip/a1d4eeb867d14cf59d98cbbed6e8f0bb
● 课程原链接https://deepmind.com/learning-resources/-introduction-reinforcement-learning-david-silver
● PDF笔记链接（初稿）（作者叶强）
https://zhuanlan.zhihu.com/p/37690204 ，https://pan.baidu.com/s/14Jxp3AGPJFgoFkHa4gXgxA#list/path=%2F
《强化学习基础 2018》课程视频，37讲，北大张志华教授
● https://resource.pku.edu.cn/index.php?r=course/detail&id=303
● 主要讲支撑强化学习的数学基础
● 该课程提供的强化学习算法软件https://github.com/liber145/rlpack
【深度强化学习课程 2020年】UC Berkeley
● 含讲解PPT、视频：http://rail.eecs.berkeley.edu/deeprlcourse/，https://www.youtube.com/playlist?list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc
【斯坦福大学强化学习2022课程】
● 讲稿：http://web.stanford.edu/class/cs234/modules.html
【上海交通大学多智能体强化学习课程】
● 本教程中首先介绍机器意识的主题，然后介绍了强化学习的基本原理——博弈论。最后，讨论了先进的多智能体强化学习算法及其最新应用。
● 讲稿http://wnzhang.net/tutorials/marl2018/index.html
【卡耐基梅隆大学深度强化学习与控制】
● 讲稿PPT https://katefvision.github.io/

干货书

【新书稿】Alekh Agarwal, Nan Jiang, Sham M. Kakade三位大师，“Reinforcement Learning: Theory and Algorithms（2022版）”（强化学习：理论与算法 2022版），205页pdf
● 来源https://alekhagarwal.net/ ，https://rltheorybook.github.io/
● 电子书2022版 https://rltheorybook.github.io/rltheorybook_AJKS.pdf
● 三位作者来自于强化学习研究团队，是强化学习研究界“牛顿”级人物，成果斐然。本书采用深入浅出，引人入胜的方法，介绍了当前RL所取得的最新成果，对于刚从事RL的学者，可谓及时雨，神笔之作。
【干货书】王树森张志华，《深度强化学习（初稿）》，289页pdf
● 来源-张志华教授主页https://www.math.pku.edu.cn/teachers/zhzhang/
● 深度强化学习（初稿）https://www.math.pku.edu.cn/teachers/zhzhang/drl_v1.pdf
【干货书】强化学习教父 Richard Sutton 的经典教材《Reinforcement Learning：An Introduction》第二版，548页pdf
● 来源http://incompleteideas.net/book/the-book-2nd.html
● 电子版http://incompleteideas.net/book/RLbook2020.pdf
● 中文翻译https://zhuanlan.zhihu.com/studyRL
● 代码http://incompleteideas.net/book/code/code2nd.html
● 基础必读，有助于理解强化学习精髓
● 本书分为三大部分，共十七章，机器之心对其简介和框架做了扼要介绍，并附上了全书目录、课程代码与资料。
● 本书中我们提出了一种通过计算实现交互式学习的方法。没有直接理论化人类或动物的学习方式，而是探索理想的学习环境，评估不同学习方法的有效性。即，站在人工智能研究者或工程师的角度来解决问题。探讨了在解决科学或经济问题方面表现突出的机器的设计，通过数学分析或计算实验评估其设计。我们提出的这一方法称之为强化学习。相较于其他机器学习方法，它更专注于交互之中的目标导向性学习。
【干货书】亚利桑那大学Mihai Surdeanu，“A Gentle Introduction to Deep Learning for Natural Language Processing”深度学习自然语言处理简明导论，69页pdf
● http://clulab.cs.arizona.edu/gentlenlp/gentlenlp-book-05172020.pdf
● 本书旨在为自然语言处理的深度学习搭建理论和实践的桥梁。涵盖了必要的理论背景，并假设读者有最少的机器学习背景。目标是让任何上过线性代数和微积分课程的人都能跟上理论材料。为了解决实际问题，本书包含了用于讨论的较简单算法的伪代码，以及用于较复杂体系结构的实际Python代码。任何上过Python编程课程的人都应该能够理解这些代码。读完这本书后，希望读者能有必要的基础，立即开始构建真实世界的、实用的自然语言处理系统，并通过阅读有关这些主题的研究出版物来扩展他们的知识。
【干货书】O'REILLY，“Reinforcement Learning: Industrial Applications of Intelligent Agents”（强化学习工业应用），408页pdf
● https://rl-book.com/ ● 涵盖了从基本的模块到最先进的实践。您将探索RL的当前状态，关注工业应用，学习许多算法，并从部署RL解决方案到生产的专门章节中受益。这不是一本教谱; 不回避数学，并希望熟悉ML。
● 了解RL是什么，以及算法如何帮助解决问题，掌握RL的基本原理，包括马尔可夫决策过程、动态规划和时间差异学习，深入研究一系列的价值和策略梯度方法，运用先进的RL解决方案，如元学习、分层学习、多智能体和模仿学习，了解前沿的深度RL算法，包括Rainbow、PPO、TD3、SAC等，通过相应的网站获得实际的例子
【干货书】Leiden大学Aske Plaat教授，“Deep Reinforcement Learning”（深度强化学习），406页pdf
● https://deep-reinforcement-learning.net/
● https://arxiv.org/pdf/2201.02135.pdf
● 这本书的目的是呈现在一个单一的深度强化学习的最新见解，适合教学一个研究生水平一个学期的课程。除了涵盖最先进的算法，我们涵盖经典强化学习和深度学习的必要背景。我们还涵盖了自我游戏、多主体、层次和元学习方面的先进的、前瞻性的发展。
【干货书】Abhishek Nandy，Manisha Biswas，“Reinforcement Learning With Open AI TensorFlow and Keras Using Python”（使用Python与Open AI TensorFlow和Keras进行强化学习），174页pdf
● 电子版 https://pan.baidu.com/s/1nQpNbhkI-3WucSD0Mk7Qcg (提取码: av5p)
● 注重实战
【干货书】“Algorithms for Reinforcement Learning”（强化学习算法）
● 原链接https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
● 较为精简，重视数学逻辑和严格推导

综述

中文

秦智慧, 李宁, 刘晓彤, 等. 无模型强化学习研究综述. 计算机科学, 2021, 48(3): 180-187.
余力, 杜启翰, 岳博妍, 等. 基于强化学习的推荐研究综述. 计算机科学, 48(10): 1-18.
刘潇, 刘书洋, 庄韫恺, 等. 强化学习可解释性基础问题探索和方法综述. 软件学报, 2021: 0-0.
孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301−1312
陈晋音, 章燕, 王雪柯, 蔡鸿斌, 王珏, 纪守领. 深度强化学习的攻防与安全性分析综述. 自动化学报, 2022, 48(1): 21−39

英文

【强化学习技术综述：策略、近期发展及未来发展方向】Mondal A K, Jamali N. A survey of reinforcement learning techniques: strategies, recent development, and future directions. arXiv preprint arXiv:2001.06921, 2020.
【自动强化学习综述】Parker-Holder J, Rajan R, Song X, et al. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems. arXiv preprint arXiv:2201.03916, 2022.
【自动驾驶领域中的强化学习综述】Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021.
【深度强化学习中的泛化研究综述】Kirk R, Zhang A, Grefenstette E, et al. A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794, 2021.
【深度强化学习探索算法最新综述，近200篇文献揭示挑战和未来方向】Yang T, Tang H, Bai C, et al. Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668, 2021.
【多智能体深度强化学习：综述】Gronauer S, Diepold K. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 2022, 55(2): 895-943.
【牛津大学发布60页最新《强化学习金融应用进展》综述论文】Hambly B, Xu R, Yang H. Recent Advances in Reinforcement Learning in Finance. arXiv preprint arXiv:2112.04553, 2021.
【UCL& UC Berkeley--深度强化学习中的泛化研究综述】Kirk R, Zhang A, Grefenstette E, et al. A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794, 2021.

经典论文

Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey[J]. Journal of artificial intelligence research, 1996, 4: 237-285.
Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
Wiering M A, Van Otterlo M. Reinforcement learning[J]. Adaptation, learning, and optimization, 2012, 12(3): 729.
Li Y. Deep reinforcement learning: An overview[J]. arXiv preprint arXiv:1701.07274, 2017.
Szepesvári C. Algorithms for reinforcement learning[J]. Synthesis lectures on artificial intelligence and machine learning, 2010, 4(1): 1-103.
Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics: A survey[J]. The International Journal of Robotics Research, 2013, 32(11): 1238-1274.
Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
Sutton R S, Barto A G. Introduction to reinforcement learning[J]. 1998.
Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 529-533.
Levine S, Finn C, Darrell T, et al. End-to-End Training of Deep Visuomotor Policies[J]. Journal of Machine Learning Research, 2015, 1-40.
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 484-489.
Mnih V, Badia A, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]. In International Conference on Machine Learning, 2016, 1928-1937.
Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 354-359.
Silver D, Hubert T, Schrittwieser J, et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv: Artificial Intelligence, 2017.
Hutson M. AI takes on video games in quest for common sense[J]. Science, 2018.
Kalashnikov D, Irpan A, Pastor P, et al. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation[J]. arXiv preprint arXiv:1806.10293, 2018.
Shi J C, Yu Y, Da Q, et al. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning[C]. Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 4902-4909.
Zeng A, Song S, Lee J, et al. TossingBot: Learning to Throw Arbitrary Objects with Residual Physics[J]. arXiv preprint arXiv:1903.11239, 2019.
OpenAI, https://www.theverge.com/2019/10/15/20914575/openai-dactyl-robotic-hand-rubiks-cube-one-handed-solve-dexterity-ai
Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
Seita D, Florence P, Tompson J, et al. Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks[C]//2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021: 4568-4575.

进阶论文

【TPAMI2022--基于图神经网络实现强化的、增量和跨语言社会事件检测】Peng H, Zhang R, Li S, et al. Reinforced, Incremental and Cross-lingual Event Detection From Social Messages[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
● 代码：https://github.com/RingBDStack/FinEvent
【AAAI2022--一种基于随机计划者-执行者-评论家模型的无监督图像柔性配准方法】Luo Z, Hu J, Wang X, et al. Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration[J]. arXiv preprint arXiv:2112.07415, 2021.
【AAAI2022--一种基于状态扰动的鲁棒强化学习算法】Kuang Y, Lu M, Wang J, et al. Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization[J]. arXiv preprint arXiv:2112.10513, 2021.
【AAAI2022--一种基于状态扰动的鲁棒强化学习算法】Wang Z, Wang J, Zhou Q, et al. Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic[J]. arXiv preprint arXiv:2112.10504, 2021.
【NeurIPS2021--首次揭示强化学习记忆池最优利用方法】Liu X H, Xue Z, Pang J, et al. Regret Minimization Experience Replay in Off-Policy Reinforcement Learning[J]. Advances in Neural Information Processing Systems, 2021, 34.
【CIKM2021--强化学习推荐模型的知识蒸馏探索之路】Xie R, Zhang S, Wang R, et al. Explore, Filter and Distill: Distilled Reinforcement Learning in Recommendation[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021: 4243-4252.
【Open AI、Google Brain大作--从多智能体自动程序中使用紧急工具】Baker B, Kanitscheider I, Markov T, et al. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.
● 代码：https://github.com/openai/multi-agent-emergence-environments
【伯克利人工智能研究实验室--基于概率上下文变量的高效非策略元强化学习】Rakelly K, Zhou A, Finn C, et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables//International conference on machine learning. PMLR, 2019: 5331-5340.
● 代码：https://github.com/katerakelly/oyster
【NeurIPS2019--探索在元学习阶段提供监督信息】Mendonca R, Gupta A, Kralev R, et al. Guided meta-policy search. Advances in Neural Information Processing Systems, 2019, 32.
【在强化学习中使用对数映射使较低的折扣因子】Van Seijen H, Fatemi M, Tavakoli A. Using a logarithmic mapping to enable lower discount factors in reinforcement learning. Advances in Neural Information Processing Systems, 2019, 32.
● 代码：https://github.com/microsoft/logrl
● 数据集：Arcade Learning Environment（https://github.com/mgbellemare/Arcade-Learning-Environment）
【分布式强化学习的有效探索】Mavrin B, Yao H, Kong L, et al. Distributional reinforcement learning for efficient exploration//International conference on machine learning. PMLR, 2019: 4424-4434.
● 数据集：CARLA（https://carla.org/）
【AAAI2019最佳论文奖--如何在强化学习中结合树搜索方法】[Efroni Y, Dalal G, Scherrer B, et al. How to combine tree-search methods in reinforcement learning//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 3494-3501.
【NeurIPS2019--无模型的强化学习算法解决连续的控制任务】Ciosek K, Vuong Q, Loftin R, et al. Better exploration with optimistic actor critic. Advances in Neural Information Processing Systems, 2019, 32.
【算法输出策略证书方法】Dann C, Li L, Wei W, et al. Policy certificates: Towards accountable reinforcement learning//International Conference on Machine Learning. PMLR, 2019: 1507-1516.
【EMNLP2016--强化学习在对话生成中的应用】Li J, Monroe W, Ritter A, et al. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541, 2016.
● 代码：https://github.com/liuyuemaicha/Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow
【NeurIPS2017--随机博弈中的在线强化学习】Wei C Y, Hong Y T, Lu C J. Online reinforcement learning in stochastic games. Advances in Neural Information Processing Systems, 2017, 30.
● 代码：https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
【CVPR2017--图像字幕的自我批判序列训练】Rennie S J, Marcheret E, Mroueh Y, et al. Self-critical sequence training for image captioning//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7008-7024.
● 代码：https://github.com/ruotianluo/neuraltalk2.pytorch
● 数据集：COCO (Microsoft Common Objects in Context)-https://cocodataset.org/
【ICCV2017--提出了基于强化学习的图像标注方法】Liu S, Zhu Z, Ye N, et al. Improved image captioning via policy gradient optimization of spider//Proceedings of the IEEE international conference on computer vision. 2017: 873-881.
● 代码：https://github.com/peteanderson80/SPICE
● 数据集：COCO (Microsoft Common Objects in Context)-https://cocodataset.org/
【NIPS2017--不完全信息博弈的安全嵌套子博弈求解】Brown N, Sandholm T. Safe and nested subgame solving for imperfect-information games. Advances in neural information processing systems, 2017, 30.
【WWW2018--学习协作:多智能体强化学习的多场景排序】Feng J, Li H, Huang M, et al. Learning to collaborate: Multi-scenario ranking via multi-agent reinforcement learning//Proceedings of the 2018 World Wide Web Conference. 2018: 1939-1948.
【SIGCOMM2017--使用强化学习的方法优化ABR（adaptive bitrate）算法】Mao H, Netravali R, Alizadeh M. Neural adaptive video streaming with pensieve//Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2017: 197-210.
● 代码：https://github.com/thu-media/Comyco
【提出了一个机器理解模型ReasoNet】Shen Y, Huang P S, Gao J, et al. Reasonet: Learning to stop reading in machine comprehension//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 1047-1055.
【NIPS2016--机器翻译的双重学习】He D, Xia Y, Qin T, et al. Dual learning for machine translation. Advances in neural information processing systems, 2016, 29.
● 代码：https://github.com/NonameAuPlatal/Dual_Learning
【IJCAI2017--强化机制设计】Tang P. Reinforcement mechanism design//IJCAI. 2017: 5146-5150.
【用强化学习调整循环神经网络】Jaques N, Gu S, Turner R E, et al. Tuning recurrent neural networks with reinforcement learning. 2017.
【WSDM2018--基于深度强化学习的异构星型网络嵌入课程学习】Qu M, Tang J, Han J. Curriculum learning for heterogeneous star network embedding via deep reinforcement learning//Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018: 468-476.

学位论文

【西南交通大学纪圣塨博士论文】城市资源智能优化方法及应用研究，https://www.zhuanzhi.ai/vip/bd3586a9b4f9d38ab10678db5f708485
【上海交通大学陈露博士论文】认知型口语交互系统中的对话管理技术，https://www.zhuanzhi.ai/vip/8d5844de744b4e287a61a80d72ee1190

框架/数据集

【OpenAI--Baselines】
● https://github.com/openai/baselines
● 复现众多经典RL算法
【OpenAI--spinningup】
● https://spinningup.openai.com/en/latest/user/introduction.html
● 提供了经典Policy-based算法的复现，优点是写的通俗易懂上手简单，并且效果有保障，而且同时tf和Pytorch的支持；缺点是没有value-based的算法，不能开发DQN系列。
【百度--PARL】
● https://github.com/paddlepaddle/parl
● 扩展性强，可复现性好，友好
【DeepMin-- OpenSpie】
● https://github.com/deepmind/open_spiel
● OpenSpiel是一个环境和算法的集合，用于研究一般的强化学习和搜索/游戏规划。
【Intel AI LAB--Coach】
● https://github.com/IntelLabs/coach
● Coach是一个python强化学习框架，包含许多最先进的算法的实现。对RL Framework的设计很模块化，比如整体流程，算法模块定义，网络定义，探索策略定义等
【Google--dopamine】
● https://github.com/google/dopamine
● dopamine是强化学习算法快速原型化的研究框架。它的目的是满足用户对一个小型的、容易理解的代码库的需求，在这个代码库中，用户可以自由地尝试各种疯狂的想法(投机研究)。
【Agent Learning Framework(ALF)】
● https://github.com/HorizonRobotics/alf
● Agent Learning Framework (ALF)是一种强化学习框架，强调实现涉及许多不同组件的复杂算法的灵活性和易用性。ALF建立在PyTorch上。
【清华大学人工智能研究院--Tianshou】
● https://github.com/thu-ml/tianshou
● Tianshou(天授)是一种基于纯PyTorch强化学习平台。现有的强化学习库主要基于TensorFlow，有许多嵌套类、不友好API或慢速，与之不同的是，Tianshou提供了一个快速模块化框架和python API，用于用最少的代码行数构建深度强化学习代理。
【MuJoCo】
● https://mujoco.org/
● 是一个物理引擎，旨在促进机器人、生物力学、图形和动画以及其他需要快速和精确模拟的领域的研究和开发。MuJoCo提供了速度、准确性和建模能力的独特组合，但它不仅仅是一个更好的模拟器。相反，它是第一个为基于模型的优化(特别是通过联系进行优化)而从头设计的全功能模拟器。
【The Arcade Learning Environment (ALE)】
● https://github.com/mgbellemare/Arcade-Learning-Environment
● ALE是一个简单的框架，允许研究人员和业余爱好者为雅达利2600游戏开发AI智能体。它建立在雅达利2600模拟器Stella之上，并将仿真的细节与代理设计分离开来。本视频描述了ALE目前支持的50多个游戏。
【CARLA】
● https://carla.org/
● 为了支持自动驾驶系统的开发、培训和验证，CARLA已经从头开始开发。除了开源代码和协议外，CARLA还提供开放的数字资产(城市布局、建筑、车辆)，这些资产是为此目的而创建的，可以自由使用。仿真平台支持传感器套件的灵活规格、环境条件、所有静态和动态参与者的完全控制、地图生成等

报告/白皮书

2021.5，“Transforming healthcare with Reinforcement Learning（强化学习改变医疗保健）”，https://f.hubspotusercontent10.net/hubfs/1868764/EU%20Whitepapers_cases_reports/Transforming%20healthcare%20with%20Reinforcement%20Learning%20White%20Paper.pdf
2017.11，《谷歌 TPU 及强化学习》，http://pdf.dfcfw.com/pdf/H3_AP201712051062442205_1.PDF

领域专家

俞勇--上海交大[https://apex.sjtu.edu.cn/members/yyu ]
俞扬--南京大学 [https://www.yuque.com/eyounx/home ]
李飞飞--美国国家工程院院士[https://profiles.stanford.edu/fei-fei-li ]
Alekh Agarwal--谷歌 [https://alekhagarwal.net/ ]
Sergey Levine--UC Berkeley[https://people.eecs.berkeley.edu/~svlevine/ ]
Pieter Abbeel--UC Berkeley[https://people.eecs.berkeley.edu/~pabbeel/ ]
David Silver--DeepMind/伦敦大学[https://www.davidsilver.uk/ ]
Rémi Munos--DeepMind[http://researchers.lille.inria.fr/munos/ ]
Chelsea Finn--斯坦福大学[https://ai.stanford.edu/~cbfinn/ ]
高剑峰--微软[https://www.microsoft.com/en-us/research/people/jfgao/ ]
timothy lillicrap--DeepMind/卡耐基梅隆大学[https://contrastiveconvergence.net/~timothylillicrap/index.php ]
Frank L. Lewis--德州大学阿灵顿分校[https://www.uta.edu/academics/faculty/profile?username=flewis ]
Jonathan P. How--麻省理工学院[https://www.mit.edu/~jhow/ ]
Koray Kavukcuoglu--DeepMind[https://koray.kavukcuoglu.org/ ]
Peter Herald Stone--德克萨斯大学[https://www.cs.utexas.edu/~pstone/ ]

资料汇编

【流行的强化学习算法的代码、练习和解决方案】Denny Britz-谷歌，https://github.com/dennybritz/reinforcement-learning
【深度强化学习入门到精通-2020最全资料综述】作者-岳龙飞，深度强化学习实验室(DeepRL-Lab)，https://aijishu.com/a/1060000000091025
【Deep Reinforcement Learning(深度强化学习)仓库】深度强化学习实验室(DeepRL-Lab)，https://github.com/neurondance/deeprl
【强化学习从入门到放弃——强化学习的学习资料汇总】更新至2019年，https://taospirit.github.io/2019/04/15/%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0%E5%85%A5%E9%97%A8/

初步版本，水平有限，有错误或者不完善的地方，欢迎大家提建议和补充，会一直保持更新，本文为专知内容组原创内容，未经允许不得转载，如需转载请发送邮件至fangquanyi@gmail.com 或联系微信专知小助手（Rancho_Fang）

敬请关注http://www.zhuanzhi.ai 和关注专知公众号，获取第一手AI相关知识