批量增值功能接近率仅实现 (Batch Value-function Approximation with Only Realizability) - 专知论文

会员服务 ·

0

Extensibility · 学成 · 可约的 · 近似 · 成对型 ·

2021 年 6 月 17 日

Batch Value-function Approximation with Only Realizability

翻译：批量增值功能接近率仅实现

Tengyang Xie,Nan Jiang

from arxiv, Published in ICML 2021

We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class. In fact, all existing algorithms demand function-approximation assumptions stronger than realizability, and the mounting negative evidence has led to a conjecture that sample-efficient learning is impossible in this setting (Chen and Jiang, 2019). Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure that reduces the learning problem to pairwise comparison, and solves the latter with the help of a state-action partition constructed from the compared functions. We also discuss how BVFT can be applied to model selection among other extensions and open problems.

翻译：我们在一个长期存在的批量强化学习(RL)问题上取得了进展:利用一个可实现的、其他任意的功能类,从一个探索性和多元规模的数据集中学习 $star $star $$ 。事实上,所有现有的算法都要求功能匹配假设比真实性强,而越来越多的负面证据导致一种推测,即在这一背景下不可能进行抽样高效学习(Chen和Jiang, 2019年)。我们的算法,BVFT,通过一个将学习问题减少到对比的竞技程序打破了硬性猜想(尽管探索性数据的概念更强 ), 并且借助从比较函数中构建的国家行动分割来解决这一问题。我们还讨论了BVFT如何在其它扩展和公开问题中应用模型选择。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

已删除

将门创投

4+阅读 · 2018年11月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

Arxiv

0+阅读 · 2021年8月19日

Inverse design optimization framework via a two-step deep learning approach: application to a wind turbine airfoil

Arxiv

0+阅读 · 2021年8月19日

Designing Approximately Optimal Search on Matching Platforms

Designing Approximately Optimal Search on Matching Platforms

Arxiv

0+阅读 · 2021年8月18日

On contraction coefficients, partial orders and approximation of capacities for quantum channels

Arxiv

0+阅读 · 2021年8月18日

A Game-Theoretic Approach to Self-Stabilization with Selfish Agents

Arxiv

0+阅读 · 2021年8月16日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

已删除

将门创投

4+阅读 · 2018年11月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

Arxiv

0+阅读 · 2021年8月19日

Inverse design optimization framework via a two-step deep learning approach: application to a wind turbine airfoil

Arxiv

0+阅读 · 2021年8月19日

Designing Approximately Optimal Search on Matching Platforms

Designing Approximately Optimal Search on Matching Platforms

Arxiv

0+阅读 · 2021年8月18日

On contraction coefficients, partial orders and approximation of capacities for quantum channels

Arxiv

0+阅读 · 2021年8月18日

A Game-Theoretic Approach to Self-Stabilization with Selfish Agents

Arxiv

0+阅读 · 2021年8月16日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

微信扫码咨询专知VIP会员