使用 " 获取回报函数 " 自动评价挖掘机操作员 (Automatic Evaluation of Excavator Operators using Learned Reward Functions) - 专知论文

会员服务 ·

0

Learning · 奖励函数 · 泛函 · HTTPS · 操作 ·

2022 年 11 月 15 日

Automatic Evaluation of Excavator Operators using Learned Reward Functions

翻译：使用 " 获取回报函数 " 自动评价挖掘机操作员

Pranav Agarwal,Marek Teichmann,Sheldon Andrews,Samira Ebrahimi Kahou

from arxiv, 11 pages, 5 figures, Accepted at Reinforcement Learning for Real Life (RL4RealLife) Workshop at NeurIPS 2022

Training novice users to operate an excavator for learning different skills requires the presence of expert teachers. Considering the complexity of the problem, it is comparatively expensive to find skilled experts as the process is time-consuming and requires precise focus. Moreover, since humans tend to be biased, the evaluation process is noisy and will lead to high variance in the final score of different operators with similar skills. In this work, we address these issues and propose a novel strategy for the automatic evaluation of excavator operators. We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance. To further validate our approach, we use this score prediction model as a source of reward for a reinforcement learning agent to learn the task of maneuvering an excavator in a simulated environment that closely replicates the real-world dynamics. For a policy learned using these external reward prediction models, our results demonstrate safer solutions following the required dynamic constraints when compared to policy trained with task-based reward functions only, making it one step closer to real-life adoption. For future research, we release our codebase at https://github.com/pranavAL/InvRL_Auto-Evaluate and video results https://drive.google.com/file/d/1jR1otOAu8zrY8mkhUOUZW9jkBOAKK71Z/view?usp=share_link .

翻译：考虑到问题的复杂性,找到熟练的专家比较昂贵,因为这一过程耗时费时,需要精确的焦点。此外,由于人往往有偏向,评价过程很吵,将导致具有类似技能的不同操作者的最后分数出现很大差异。在这项工作中,我们解决这些问题,并为自动评价挖掘机操作员提出新的战略。我们考虑到挖掘机的内部动态和每个步骤的安全标准,以评价业绩。为了进一步验证我们的方法,我们用这个计分预测模型作为奖励来源,奖励一个强化学习代理机构学习在模拟环境中操纵挖掘机的任务,这种模拟环境将密切复制现实世界的动态。对于利用这些外部奖励预测模型学习的政策,我们的结果显示在与仅受基于任务的奖励功能培训的政策相比,在遇到所需的动态限制后更安全的解决办法,使之更接近于现实生活。为了未来研究,我们发布了在 https://gigh_Ohus_BOio_Acom/provaravalal_ httpsurus_Oqrus_Oqrus_Angou_Angru_Agroval_Inavalalal/ALAL。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

活性氧介导的内质网应激在博莱霉素诱发肺上皮-间质转化和肺纤维化中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

基于PERK/elF2α通路研究针刺调控MCAO/R大鼠内质网应激-自噬稳态重构的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

ApoA1/ABCA1抗动脉粥样硬化新机制-自噬介导的血管外周脂肪组织抗炎途径

国家自然科学基金

0+阅读 · 2014年12月31日

Netrin-1在康复训练促进大鼠脑缺血后神经重塑和功能重建中的作用及相关分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-340/c-Met通过下调MMP-9表达缓解肝脏缺血再灌注损伤的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

可见光活化分子氧产生羟基自由基降解有机污染物的新途径

国家自然科学基金

0+阅读 · 2012年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

超分子功能凝胶体系中Fe3O4磁纳米晶的控制生长与性能优化

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Arxiv

0+阅读 · 2023年1月12日

RAP: Risk-Aware Prediction for Robust Planning

Arxiv

0+阅读 · 2023年1月12日

tieval: An Evaluation Framework for Temporal Information Extraction Systems

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Arxiv

0+阅读 · 2023年1月11日

Exploration of Parameter Spaces Assisted by Machine Learning

Arxiv

0+阅读 · 2023年1月11日

Optical Flow for Autonomous Driving: Applications, Challenges and Improvements

Arxiv

0+阅读 · 2023年1月11日

RePAD: Real-time Proactive Anomaly Detection for Time Series

RePAD: Real-time Proactive Anomaly Detection for Time Series

Arxiv

0+阅读 · 2023年1月9日

On The Fragility of Learned Reward Functions

Arxiv

0+阅读 · 2023年1月9日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Arxiv

0+阅读 · 2023年1月12日

RAP: Risk-Aware Prediction for Robust Planning

Arxiv

0+阅读 · 2023年1月12日

tieval: An Evaluation Framework for Temporal Information Extraction Systems

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Arxiv

0+阅读 · 2023年1月11日

Exploration of Parameter Spaces Assisted by Machine Learning

Arxiv

0+阅读 · 2023年1月11日

Optical Flow for Autonomous Driving: Applications, Challenges and Improvements

Arxiv

0+阅读 · 2023年1月11日

RePAD: Real-time Proactive Anomaly Detection for Time Series

RePAD: Real-time Proactive Anomaly Detection for Time Series

Arxiv

0+阅读 · 2023年1月9日

On The Fragility of Learned Reward Functions

Arxiv

0+阅读 · 2023年1月9日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

活性氧介导的内质网应激在博莱霉素诱发肺上皮-间质转化和肺纤维化中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

基于PERK/elF2α通路研究针刺调控MCAO/R大鼠内质网应激-自噬稳态重构的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

ApoA1/ABCA1抗动脉粥样硬化新机制-自噬介导的血管外周脂肪组织抗炎途径

国家自然科学基金

0+阅读 · 2014年12月31日

Netrin-1在康复训练促进大鼠脑缺血后神经重塑和功能重建中的作用及相关分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-340/c-Met通过下调MMP-9表达缓解肝脏缺血再灌注损伤的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

可见光活化分子氧产生羟基自由基降解有机污染物的新途径

国家自然科学基金

0+阅读 · 2012年12月31日

Par-4在hTERT非端粒酶活性依赖抗凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

超分子功能凝胶体系中Fe3O4磁纳米晶的控制生长与性能优化

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

ANCA诱导的ROS在调控中性粒细胞凋亡∕NETosis转换中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员