Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL (Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL)

We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.

翻译：使用多步逆运动学的表示学习: 丰富观测强化学习的高效和最优方法我们研究在复杂、高维观测下的强化学习的样本高效算法设计，通过Block MDP问题进行形式化。现有算法存在三个问题：1) 计算难度大；2) 强的假设在实践中并不一定成立；3) 样本复杂度低效。我们通过提供第一个计算效率高的算法解决这些问题，并且对被要求的精度水平获得最优样本复杂度，且假设尽可能少。我们的算法，MusIK，结合了系统性探索和基于多步逆运动学的表示学习。多步逆运动学是一种学习目标，从当前观察和未来（可能是远的）观察预测学习者自己的行动。MusIK简单灵活，并且可以有效地利用通用的函数逼近。我们的分析利用了针对非乐观探索算法的几种新技术，我们预计这些技术会得到更广泛的应用。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

131+阅读 · 2022年2月27日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日