海底环境中机器人探索的学习和转让价值功能 (Learning and Transferring Value Function for Robot Exploration in Subterranean Environments) - 专知论文

会员服务 ·

0

价值函数 · 泛函 · 回合 · 状态值函数 · 学成 ·

2022 年 4 月 7 日

Learning and Transferring Value Function for Robot Exploration in Subterranean Environments

翻译：海底环境中机器人探索的学习和转让价值功能

Yafei Hu,Chen Wang,John Keller,Sebastian Scherer

In traditional robot exploration methods, the robot usually does not have prior biases about the environment it is exploring. Thus the robot assigns equal importance to the goals which leads to insufficient exploration efficiency. Alternative, often a hand-tuned policy is used to tweak the value of goals. In this paper, we present a method to learn how "good" some states are, measured by the state value function, to provide a hint for the robot to make exploration decisions. We propose to learn state value functions from previous offline collected datasets and then transfer and improve the value function during testing in a new environment. Moreover, the environments usually have very few and even no extrinsic reward or feedback for the robot. Therefore in this work, we also tackle the problem of sparse extrinsic rewards from the environments. We design several intrinsic rewards to encourage the robot to obtain more information during exploration. These reward functions then become the building blocks of the state value functions. We test our method on challenging subterranean and urban environments. To the best of our knowledge, this work for the first time demonstrates value function prediction with previous collected datasets to help exploration in challenging subterranean environments.

翻译：在传统的机器人探索方法中,机器人通常对所探索的环境没有先入之见。因此, 机器人对导致探索效率不足的目标给予同等的重视。另一种, 通常使用手动调整的政策来调整目标的价值。在本文中, 我们提出了一个方法来学习如何用国家价值函数来测量某些国家的“ 好” 来为机器人做出勘探决定提供提示。我们提议从先前的离线收集的数据集中学习状态值函数, 然后在新环境中的测试中传输和改进价值函数。此外, 环境通常很少甚至没有机器人的外部奖赏或反馈。因此, 在这项工作中, 我们还解决了环境极端奖赏稀少的问题。我们设计了一些内在奖赏来鼓励机器人在探索过程中获得更多信息。这些奖赏功能随后成为了国家价值函数的构件。我们用挑战亚地和城市环境的方法测试。为了我们的最佳知识, 这项工作首次展示了用先前收集的数据集来帮助在挑战亚地环境中进行探索的价值预测。

0

相关内容

价值函数

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

高动态编队无人机自主高精度时间同步方法研究

国家自然科学基金

11+阅读 · 2013年12月31日

复杂海洋环境下水下异构多自治机器人系统的协调控制

国家自然科学基金

4+阅读 · 2013年12月31日

基于跨域深度学习的复杂视频场景分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于事件触发机制的多智能体系统分布式协调控制研究

国家自然科学基金

3+阅读 · 2012年12月31日

复杂随机动态系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2012年12月31日

含约束信息的局域强耦合复杂系统滤波方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

道路网络在交通事件持续期的动态可靠性研究

国家自然科学基金

0+阅读 · 2009年12月31日

扰动下一般复杂网络鲁棒自适应牵制控制、同步及H∞#20248;化

国家自然科学基金

0+阅读 · 2009年12月31日

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Arxiv

0+阅读 · 2022年4月20日

Per-run Algorithm Selection with Warm-starting using Trajectory-based Features

Per-run Algorithm Selection with Warm-starting using Trajectory-based Features

Arxiv

0+阅读 · 2022年4月20日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Arxiv

1+阅读 · 2022年4月15日

Online Exploration of an Unknown Region of Interest with a Team of Aerial Robots

Arxiv

0+阅读 · 2022年4月14日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

VIP会员

文章信息

相关主题

状态值函数

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Arxiv

0+阅读 · 2022年4月20日

Per-run Algorithm Selection with Warm-starting using Trajectory-based Features

Per-run Algorithm Selection with Warm-starting using Trajectory-based Features

Arxiv

0+阅读 · 2022年4月20日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

Arxiv

1+阅读 · 2022年4月15日

Online Exploration of an Unknown Region of Interest with a Team of Aerial Robots

Arxiv

0+阅读 · 2022年4月14日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

相关基金

高动态编队无人机自主高精度时间同步方法研究

国家自然科学基金

11+阅读 · 2013年12月31日

复杂海洋环境下水下异构多自治机器人系统的协调控制

国家自然科学基金

4+阅读 · 2013年12月31日

基于跨域深度学习的复杂视频场景分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于事件触发机制的多智能体系统分布式协调控制研究

国家自然科学基金

3+阅读 · 2012年12月31日

复杂随机动态系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2012年12月31日

含约束信息的局域强耦合复杂系统滤波方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

道路网络在交通事件持续期的动态可靠性研究

国家自然科学基金

0+阅读 · 2009年12月31日

扰动下一般复杂网络鲁棒自适应牵制控制、同步及H∞#20248;化

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员