利用离线数据改进蒙特卡洛评价 (Improving Monte Carlo Evaluation with Offline Data) - 专知论文

会员服务 ·

0

估计/估计量 · 蒙特卡罗 · 在线 · 可约的 · 样本 ·

2023 年 1 月 31 日

Improving Monte Carlo Evaluation with Offline Data

翻译：利用离线数据改进蒙特卡洛评价

Shuze Liu,Shangtong Zhang

Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use off-policy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the off-policy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our off-policy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our off-policy MC estimator is always unbiased.

翻译：蒙特卡洛( Monte Carlo) 方法是用来估计政策绩效的最广泛使用的方法。根据一种感兴趣的政策, MC 方法通过反复执行这一政策来进行估计来采集样本和平均结果。在这个过程中收集的样本被称为在线样本。为了得到准确的估计, MC 方法会消耗大量的在线样本。当在线样本昂贵时, 例如在线建议和库存管理, 我们想要减少在线样本的数量, 同时实现同样的估计准确性。为此, 我们使用非政策 MC 方法, 通过实施一个叫作行为政策来评估相关政策。我们设计了一种有针对性的行为政策, 使非政策性监控测量器的差异比普通的MC 估计器小得多。重要的是, 这种适应性的行为政策可以有效地从现有的离线数据中学习, 即以前登录的数据, 比在线样本便宜得多。由于差异减少, 我们的离政策 MC 方法需要更少的在线样本来评估政策绩效, 与普通的MC 方法相比。此外, 我们的非政策 MC 估测器总是不偏袒的。

0

相关内容

估计/估计量

估计/估计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

分解炉RDF与煤多级燃烧N的迁移转化及NOx控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LCA视角下宏观建筑碳排放结构特性、演化机理及区域化减排机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

大功率重载荷复合行星传动系统均载特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性分段连续型微分系统数值方法的分支相容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

粘弹性湍流减阻流动的POD低阶模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

转基因雪旺细胞促进视神经再生的研究

国家自然科学基金

0+阅读 · 2011年12月31日

干湿交替过程中土壤氧化铁形态转化对As和Sb环境化学行为的影响机制

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

基于宏细观分析的砂土各向异性本构模拟

国家自然科学基金

0+阅读 · 2008年12月31日

On Designing a Learning Robot: Improving Morphology for Enhanced Task Performance and Learning

Arxiv

0+阅读 · 2023年3月23日

LearnedFTL: A Learning-based Page-level FTL for Improving Random Reads in Flash-based SSDs

Arxiv

0+阅读 · 2023年3月23日

Real-Time Evaluation in Online Continual Learning: A New Hope

Arxiv

1+阅读 · 2023年3月23日

Estimation of quadrature errors for layer potentials evaluated near surfaces with spherical topology

Arxiv

0+阅读 · 2023年3月22日

Optimal selection of the starting lineup for a football team

Arxiv

0+阅读 · 2023年3月22日

The dual of an evaluation code

Arxiv

0+阅读 · 2023年3月22日

Improving Human-Robot Collaboration via Computational Design

Arxiv

0+阅读 · 2023年3月20日

Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data

Arxiv

0+阅读 · 2023年3月20日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

相关论文

On Designing a Learning Robot: Improving Morphology for Enhanced Task Performance and Learning

Arxiv

0+阅读 · 2023年3月23日

LearnedFTL: A Learning-based Page-level FTL for Improving Random Reads in Flash-based SSDs

Arxiv

0+阅读 · 2023年3月23日

Real-Time Evaluation in Online Continual Learning: A New Hope

Arxiv

1+阅读 · 2023年3月23日

Estimation of quadrature errors for layer potentials evaluated near surfaces with spherical topology

Arxiv

0+阅读 · 2023年3月22日

Optimal selection of the starting lineup for a football team

Arxiv

0+阅读 · 2023年3月22日

The dual of an evaluation code

Arxiv

0+阅读 · 2023年3月22日

Improving Human-Robot Collaboration via Computational Design

Arxiv

0+阅读 · 2023年3月20日

Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data

Arxiv

0+阅读 · 2023年3月20日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

相关基金

分解炉RDF与煤多级燃烧N的迁移转化及NOx控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LCA视角下宏观建筑碳排放结构特性、演化机理及区域化减排机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

大功率重载荷复合行星传动系统均载特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性分段连续型微分系统数值方法的分支相容性研究

国家自然科学基金

0+阅读 · 2012年12月31日

粘弹性湍流减阻流动的POD低阶模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

转基因雪旺细胞促进视神经再生的研究

国家自然科学基金

0+阅读 · 2011年12月31日

干湿交替过程中土壤氧化铁形态转化对As和Sb环境化学行为的影响机制

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

基于宏细观分析的砂土各向异性本构模拟

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员