Deep reinforcement learning has been one of the fastest growing fields of machine learning over the past years and numerous libraries have been open sourced to support research. However, most codebases have a steep learning curve or limited flexibility that do not satisfy a need for fast prototyping in fundamental research. This paper introduces Tonic, a Python library allowing researchers to quickly implement new ideas and measure their importance by providing: 1) general-purpose configurable modules 2) several baseline agents: A2C, TRPO, PPO, MPO, DDPG, D4PG, TD3 and SAC built with these modules 3) support for TensorFlow 2 and PyTorch 4) support for continuous-control environments from OpenAI Gym, DeepMind Control Suite and PyBullet 5) scripts to experiment in a reproducible way, plot results, and play with trained agents 6) a benchmark of the provided agents on 70 continuous-control tasks. Evaluation is performed in fair conditions with identical seeds, training and testing loops, while sharing general improvements such as non-terminal timeouts and observation normalization. Finally, to demonstrate how Tonic simplifies experimentation, a novel agent called TD4 is implemented and evaluated.
翻译:过去几年来,深入强化学习是发展最快的机器学习领域之一,许多图书馆也公开提供,以支持研究;然而,大多数代码库的学习曲线陡峭,或灵活性有限,无法满足基础研究快速原型的需要;本文介绍Tonic,即Python图书馆,使研究人员能够通过提供:(1) 通用可配置模块,迅速实施新想法并衡量其重要性;(2) 几个基准物剂:A2C、TRPO、PPO、MOO、DDPG、D4PG、TD3和SAC, 与这些模块一起建立;(3) 对TensorFlow 2和PyTorrch 4的支持, 支持OpenAI Gym、DeepMind控制套和PyBullet 5 支持持续控制环境;使研究人员能够以可推广的方式实验的脚本、绘图结果和与受过训练的代理人一起玩耍(6); 提供代理的70项连续控制任务的基准;评价在公平的条件下进行,有相同的种子、培训和测试环,同时分享非临时超时段和观察正常化。 最后,展示了Tomicalimpal。