利用电解风险最小化进行环境评估 (Off Environment Evaluation Using Convex Risk Minimization) - 专知论文

会员服务 ·

0

Performer · 估计/估计量 · 回合 · 目标领域 · 机器人 ·

2021 年 12 月 21 日

Off Environment Evaluation Using Convex Risk Minimization

翻译：利用电解风险最小化进行环境评估

Pulkit Katdare,Shuijing Liu,Katherine Driggs-Campbell

from arxiv, 7 pages, 3 figures (with sub-figures)

Applying reinforcement learning (RL) methods on robots typically involves training a policy in simulation and deploying it on a robot in the real world. Because of the model mismatch between the real world and the simulator, RL agents deployed in this manner tend to perform suboptimally. To tackle this problem, researchers have developed robust policy learning algorithms that rely on synthetic noise disturbances. However, such methods do not guarantee performance in the target environment. We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain using trajectory data from both environments. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain, effectively bridging the gap between these two environments. We also show that the convergence rate of our estimator to be of the order of ${n^{-1/4}}$, where $n$ is the number of training samples. In simulation, we demonstrate how our method effectively approximates and evaluates performance on Gridworld, Cartpole, and Reacher environments on a range of policies. We also show that the our method is able to estimate performance of a 7 DOF robotic arm using the simulator and remotely collected data from the robot in the real world.

翻译：在机器人问题上应用强化学习(RL)方法通常涉及模拟政策和在现实世界中的机器人上部署该技术的政策。由于真实世界和模拟器之间的模型不匹配,以这种方式部署的RL代理器往往能发挥副最佳效果。为解决这一问题,研究人员开发了强大的政策学习算法,依赖合成噪音扰动。然而,这些方法并不能保证目标环境中的性能。我们建议采用一种控制风险最小化算法,利用两种环境的轨迹数据估计模拟器和目标域之间的模型不匹配情况。我们表明,这个估计器可以与模拟器一起使用,以评价目标区域范围内的RL代理器的性能,从而有效地缩小这两个环境之间的差距。我们还表明,我们的估计器的趋同率是按${n ⁇ -1/4 ⁇ ]的顺序排列的。而美元是培训样本的数量。我们建议采用一种最小风险最小化算法来估计模型在Gridworld、Cartpole和Reader环境中的性能。我们还展示了与模拟器在政策范围内的模拟器中,我们所收集到的机器人和遥控机世界的性能数据能够估计。我们所收集到的机器人的模型。

0

相关内容

Performer

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机耦合振子的逼近

国家自然科学基金

0+阅读 · 2014年12月31日

绿色制造企业级能源与生产协调随机优化调度

国家自然科学基金

2+阅读 · 2014年12月31日

基于人群密度的虚拟人群行为建模及仿真技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于地磁多参量多目标搜索的AUV仿生导航研究

国家自然科学基金

1+阅读 · 2012年12月31日

超宽带通信数字接收机的压缩采样技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

非凸与非光滑优化的高效率全局收敛算法

国家自然科学基金

0+阅读 · 2011年12月31日

基于电容感应技术的海冰厚度监测方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于随机发展方程模型的水下导航非线性变分滤波新方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于并行微种群遗传算法的变密度地下水模拟优化模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Estimating Software Reliability Using Size-biased Modelling

Arxiv

0+阅读 · 2022年4月20日

Differentiable Collision Avoidance Using Collision Primitives

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Ball 3D Localization From A Single Calibrated Image

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Arxiv

0+阅读 · 2022年4月17日

Fast Multi-grid Methods for Minimizing Curvature Energy

Arxiv

0+阅读 · 2022年4月17日

On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

Arxiv

0+阅读 · 2022年4月16日

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Estimating Software Reliability Using Size-biased Modelling

Arxiv

0+阅读 · 2022年4月20日

Differentiable Collision Avoidance Using Collision Primitives

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Ball 3D Localization From A Single Calibrated Image

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization

Arxiv

0+阅读 · 2022年4月17日

Fast Multi-grid Methods for Minimizing Curvature Energy

Arxiv

0+阅读 · 2022年4月17日

On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

Arxiv

0+阅读 · 2022年4月16日

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition

Arxiv

0+阅读 · 2022年4月14日

相关基金

随机耦合振子的逼近

国家自然科学基金

0+阅读 · 2014年12月31日

绿色制造企业级能源与生产协调随机优化调度

国家自然科学基金

2+阅读 · 2014年12月31日

基于人群密度的虚拟人群行为建模及仿真技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于地磁多参量多目标搜索的AUV仿生导航研究

国家自然科学基金

1+阅读 · 2012年12月31日

超宽带通信数字接收机的压缩采样技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

非凸与非光滑优化的高效率全局收敛算法

国家自然科学基金

0+阅读 · 2011年12月31日

基于电容感应技术的海冰厚度监测方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于随机发展方程模型的水下导航非线性变分滤波新方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于并行微种群遗传算法的变密度地下水模拟优化模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员