" 噪音免费午餐:代表学习的可行和实际探索 " (A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning)

Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.

翻译：代表制学习是深思熟虑处理维度诅咒的经验成功的核心,然而,代表制学习的力量尚未在强化学习(RL)中得到充分利用(RL),原因是(i),表达性和可移动性之间的权衡;和(ii),探索和代表制学习的结合。在本文中,我们首先揭示了一个事实,即根据随机控制模型中的一些噪音假设,我们可以免费获得相应的Markov过渡运营商的线性光谱特征。基于这一观察,我们提议光谱动态嵌入(SPEDE),通过利用噪音的结构,打破交易,完成对代表制学习的乐观探索。我们对SPEDE进行严格的理论分析,并表明在几个基准上比现有最新经验算法的实际优于现有水平。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日