ICML2019-深度强化学习文章汇总

2019 年 5 月 10 日 深度强化学习实验室

深度强化学习-Report

来源：icml2019 conference

编辑：DeepRL

强化学习是一种通用的学习、预测和决策范式。RL为顺序决策问题提供了解决方法，并将其转化为顺序决策问题。RL与优化、统计学、博弈论、因果推理、序贯实验等有着深刻的联系，与近似动态规划和最优控制有着很大的重叠，在科学、工程和艺术领域有着广泛的应用。

RL最近在学术界取得了稳定的进展，如Atari游戏、AlphaGo、VisuoMotor机器人政策。RL也被应用于现实场景，如推荐系统和神经架构搜索。请参阅有关RL应用程序的最新集合。希望RL系统能够在现实世界中工作，并具有实际的好处。然而，RL存在着许多问题，如泛化、样本效率、勘探与开发困境等。因此，RL远未被广泛部署。对于RL社区来说，常见的、关键的和紧迫的问题是：RL是否有广泛的部署？问题是什么？如何解决这些问题？

在国际会议上的机器学习（ICML）是一个国际学术会议上机器学习。它是机器学习和人工智能研究中高影响力的两个主要会议之一。每年的ICML中都有大量的关于强化学习的文章,其中2019总共接收强化学习论文46篇（已经是很高比例了，快接近10%），下面是本次会议文章的总结，文章pdf版本汇总下载链接见文章末尾。

方法类文章

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Quantifying Generalization in Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Neural Logic Reinforcement Learning
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning
Calibrated Model-Based Deep Reinforcement Learning
Information-Theoretic Considerations in Batch Reinforcement Learning
Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation
Option Discovery for Solving Sparse Reward Reinforcement Learning Problems

优化类文章

Fingerprint Policy Optimisation for Robust Reinforcement Learning
Collaborative Evolutionary Reinforcement Learning
Composing Value Functions in Reinforcement Learning
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Policy Consolidation for Continual Reinforcement Learning

探索-利用及模型参数

Exploration Conscious Reinforcement Learning Revisited
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Control Regularization for Reduced Variance Reinforcement Learning
Dead-ends and Secure Exploration in Reinforcement Learning
Off-Policy Deep Reinforcement Learning without Exploration
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
On the Generalization Gap in Reparameterizable Reinforcement Learning

多智能体

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Multi-Agent Adversarial Inverse Reinforcement Learning
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Actor-Attention-Critic for Multi-Agent Reinforcement Learning

图模型强化学习

TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

分布式强化学习

Statistics and Samples in Distributional Reinforcement Learning
Distribution Reinforcement Learning for Efficient Exploration

应用类

Action Robust Reinforcement Learning and Applications in Continuous Control
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
Learning Action Representations for Reinforcement Learning
The Value Function Polytope in Reinforcement Learning
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

其他

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
A Deep Reinforcement Learning Perspective on Internet Congestion Control
Reinforcement Learning in Configurable Continuous Environments
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

注：部分文章还没有在arxiv上，或者没有的请自行Google

paper-PDF版本（资源获取）