[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
[2] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.
[3] Silver, David, et al. "Mastering the game of go without human knowledge." Nature 550.7676 (2017): 354.
[4] Levine, Sergey, et al. "End-to-end training of deep visuomotor policies." arXiv preprint arXiv:1504.00702, 2015.
[5] Mao, Hongzi, et al. "Resource management with deep reinforcement learning." Proceedings of the 15th ACM Workshop on Hot Topics in Networks. ACM, 2016.
[6] deepmind.com/blog/deepm
[7] Jaques, Natasha, et al. "Tuning recurrent neural networks with reinforcement learning." (2017).
[8] Henderson, Peter, et al. "Deep reinforcement learning that matters." arXiv
preprint arXiv:1709.06560 (2017).
[9] Islam, Riashat, et al. "Reproducibility of benchmarked deep reinforcement learning tasks for continuous control." arXiv preprint arXiv:1708.04133 (2017).
[10] riashatislam.files.wordpress.com
[11] sites.google.com/view/d
[12]reddit.com/r/MachineLea
[13] alexirpan.com/2018/02/1
[14] himanshusahni.github.io
[15] amid.fish/reproducing-d
[16] rodeo.ai/2018/05/06/rep
[17] Dayan, Peter, and Yael Niv. "Reinforcement learning: the good, the bad and the ugly." Current opinion in neurobiology 18.2 (2008): 185-196.
[18] argmin.net/2018/05/11/o
[19] Mania, Horia, Aurelia Guy, and Benjamin Recht. "Simple random search provides a competitive approach to reinforcement learning." arXiv preprint arXiv:1803.07055 (2018).
[20] Justesen, Niels, et al. "Deep Learning for Video Game Playing." arXiv preprint arXiv:1708.07902 (2017).
[21] youtube.com/watch?
[22] sites.google.com/view/i
[23] Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).
[24] Deisenroth, Marc, and Carl E. Rasmussen. "PILCO: A model-based and data-efficient approach to policy search." Proceedings of the 28th International Conference on machine learning (ICML-11). 2011.
[25] Levine, Sergey, and Vladlen Koltun. "Guided policy search." International Conference on Machine Learning. 2013.
[26] Weber, Théophane, et al. "Imagination-augmented agents for deep reinforcement learning." arXiv preprint arXiv:1707.06203 (2017).
[27] Sutton, Richard S. "Dyna, an integrated architecture for learning, planning, and reacting." ACM SIGART Bulletin 2.4 (1991): 160-163.
[28] Silver, David, Richard S. Sutton, and Martin Müller. "Sample-based learning and search with permanent and transient memories." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
[29] Rajeswaran, Aravind, et al. "Towards generalization and simplicity in continuous control." Advances in Neural Information Processing Systems. 2017.
[30] andreykurenkov.com/writ
[31] UCL Course on RL: www0.cs.ucl.ac.uk/staff
[32] Powell, Warren B. "AI, OR and control theory: A rosetta stone for stochastic optimization." Princeton University (2012).
[33] Pomerleau, Dean A. "Alvinn: An autonomous land vehicle in a neural network." Advances in neural information processing systems. 1989.
[34] Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.
[35] Osa, Takayuki, et al. "An algorithmic perspective on imitation learning." Foundations and Trends® in Robotics 7.1-2 (2018): 1-179.
[36] fermatslibrary.com/arxi url=https%3A%2F%2Farxiv.org%2Fpdf%2F1406.2661.pdf
[37] Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in Neural Information Processing Systems. 2016.
[38] github.com/carla-simula
[39] 36kr.com/p/5129474.html
[40] Vezhnevets, Alexander Sasha, et al. "Feudal networks for hierarchical reinforcement learning." arXiv preprint arXiv:1703.01161 (2017).
[41] Ranzato, Marc'Aurelio, et al. "Sequence level training with recurrent neural networks." arXiv preprint arXiv:1511.06732 (2015).
[42] Bahdanau, Dzmitry, et al. "An actor-critic algorithm for sequence prediction." arXiv preprint arXiv:1607.07086 (2016).
[43] Keneshloo, Yaser, et al. "Deep Reinforcement Learning For Sequence to Sequence Models." arXiv preprint arXiv:1805.09461 (2018).
[44] Watters, Nicholas, et al. "Visual interaction networks." arXiv preprint arXiv:1706.01433 (2017).
[45] Hamrick, Jessica B., et al. "Relational inductive bias for physical construction in humans and machines." arXiv preprint arXiv:1806.01203 (2018).
[46] Zambaldi, Vinicius, et al. "Relational Deep Reinforcement Learning." arXiv preprint arXiv:1806.01830 (2018).
[47] Santoro, Adam, et al. "Relational recurrent neural networks." arXiv preprint arXiv:1806.01822 (2018).
[48] Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).
[49] Eslami, SM Ali, et al. "Neural scene representation and rendering." Science 360.6394 (2018): 1204-1210.
[50] Huang, Sandy, et al. "Adversarial attacks on neural network policies." arXiv preprint arXiv:1702.02284 (2017).
[51] Behzadan, Vahid, and Arslan Munir. "Vulnerability of deep reinforcement learning to policy induction attacks." International Conference on Machine Learning and Data Mining in Pattern Recognition. Springer, Cham, 2017.
[52] Tesauro, Gerald. "Temporal difference learning and TD-Gammon." Communications of the ACM 38.3 (1995): 58-68.
[53] Jones, Rebecca M., et al. "Behavioral and neural properties of social reinforcement learning."Journal of Neuroscience 31.37 (2011): 13039-13045.
[54] github.com/YBIGTA/DeepN
[55] Lewis, Mike, et al. "Deal or no deal? end-to-end learning for negotiation dialogues." arXiv preprint arXiv:1706.05125 (2017).
[56] microsoft.com/zh-cn/tra
[57] Derhami, Vali, et al. "Applying reinforcement learning for web pages ranking algorithms." Applied Soft Computing 13.4 (2013): 1686-1692.
[58] Wu, Jiawei, Lei Li, and William Yang Wang. "Reinforced Co-Training." arXiv preprint arXiv:1804.06035 (2018).
[59] Li, Yuxi. "Deep reinforcement learning: An overview." arXiv preprint arXiv:1701.07274 (2017).
[60] Feinberg, Eugene A., and Adam Shwartz, eds. Handbook of Markov decision processes: methods and applications. Vol. 40. Springer Science & Business Media, 2012.
[61] en.wikipedia.org/wiki/C
[62] Turing, Alan M. "Computing machinery and intelligence." Parsing the Turing Test. Springer, Dordrecht, 2009. 23-65.
[63] en.wikipedia.org/wiki/A
[64] Russell, Stuart J., and Peter Norvig. Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited,, 2016.
[65] Noë, Alva. Action in perception. MIT press, 2004.
[66] Garnelo, Marta, Kai Arulkumaran, and Murray Shanahan. "Towards deep symbolic reinforcement learning." arXiv preprint arXiv:1609.05518 (2016).
[67] argmin.net/2018/04/16/e
[68] Bojarski, Mariusz, et al. "End to end learning for self-driving cars." arXiv preprint arXiv:1604.07316 (2016).
[69] Recht, Benjamin . "A Tour of Reinforcement Learning:The View from Continuous Control." arXiv preprint arXiv: 1806.09460
[70] Kalashnikov, Dmitry, et al. "QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation." arXiv preprint arXiv:1806.10293 (2018).
[71] Tan, Jie, et al. "Sim-to-Real: Learning Agile Locomotion For Quadruped Robots." arXiv preprint arXiv:1804.10332 (2018).
[72] Auer, Peter. "Using confidence bounds for exploitation-exploration trade-offs." Journal of Machine Learning Research 3.Nov (2002): 397-422.
[73] Agrawal, Shipra, and Navin Goyal. "Thompson sampling for contextual bandits with linear payoffs." International Conference on Machine Learning. 2013.
[74] Mohamed, Shakir, and Danilo Jimenez Rezende. "Variational information maximisation for intrinsically motivated reinforcement learning." Advances in neural information processing systems. 2015.
[75] Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." International Conference on Machine Learning (ICML). Vol. 2017. 2017.
[76] Tang, Haoran, et al. "# Exploration: A study of count-based exploration for deep reinforcement learning." Advances in Neural Information Processing Systems. 2017.
[77] McFarlane, Roger. "A Survey of Exploration Strategies in Reinforcement Learning." McGill University, www. cs. mcgill. ca/∼ cs526/roger. pdf, accessed: April (2018).
[78] Plappert, Matthias, et al. "Parameter space noise for exploration." arXiv preprint arXiv:1706.01905 (2017).
[79] Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295 (2017).
[80] Kansky, Ken, et al. "Schema networks: Zero-shot transfer with a generative causal model of intuitive physics." arXiv preprint arXiv:1706.04317 (2017).
[81] Li, Da, et al. "Learning to generalize: Meta-learning for domain generalization." arXiv preprint arXiv:1710.03463 (2017).
[82] berkeleyautomation.github.io
[83] contest.openai.com/2018