CASOG: 具有平滑梯度的保守演员评论家算法——用于机器人辅助手术中的技能学习 (CASOG: Conservative Actor-critic with SmOoth Gradient for Skill Learning in Robot-Assisted Intervention) - 专知论文

会员服务 ·

0

平滑 · 梯度 · 机器人 · 离线强化学习 · 算法 ·

2023 年 4 月 19 日

CASOG: Conservative Actor-critic with SmOoth Gradient for Skill Learning in Robot-Assisted Intervention

翻译：CASOG: 具有平滑梯度的保守演员评论家算法——用于机器人辅助手术中的技能学习

Hao Li,Xiao-Hu Zhou,Xiao-Liang Xie,Shi-Qi Liu,Zhen-Qiu Feng,Zeng-Guang Hou

from arxiv, 13 pages, 5 figure, preprint

Robot-assisted intervention has shown reduced radiation exposure to physicians and improved precision in clinical trials. However, existing vascular robotic systems follow master-slave control mode and entirely rely on manual commands. This paper proposes a novel offline reinforcement learning algorithm, Conservative Actor-critic with SmOoth Gradient (CASOG), to learn manipulation skills from human demonstrations on vascular robotic systems. The proposed algorithm conservatively estimates Q-function and smooths gradients of convolution layers to deal with distribution shift and overfitting issues. Furthermore, to focus on complex manipulations, transitions with larger temporal-difference error are sampled with higher probability. Comparative experiments in a pre-clinical environment demonstrate that CASOG can deliver guidewire to the target at a success rate of 94.00\% and mean backward steps of 14.07, performing closer to humans and better than prior offline reinforcement learning methods. These results indicate that the proposed algorithm is promising to improve the autonomy of vascular robotic systems.

翻译：机器人辅助手术已经显示出对医生的辐射暴露降低和在临床试验中精度提高的优点。然而，现有的血管机器人系统遵循主从控制模式并完全依赖于手动命令。本文提出了一种新的离线强化学习算法——具有平滑梯度的保守演员评论家算法（CASOG），以从血管机器人系统上的人类演示中学习操纵技能。所提出的算法保守地估计Q函数并平滑卷积层的梯度以处理分布偏移和过度拟合问题。此外，为了集中处理复杂的操纵，具有较大时差误差的转移被高概率采样。在临床前环境中进行的比较实验表明，CASOG在达到94.00％的成功率和14.07的平均反向步骤时，能够将导管输送至目标，表现更接近于人类，比以前的离线强化学习方法更好。这些结果表明，所提出的算法有望提高血管机器人系统的自主性。

0

相关内容

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

专知会员服务

37+阅读 · 2022年7月12日

【CVPR 2022】基于可迁移GNN的自适应轨迹预测，Adaptive Trajectory Prediction via Transferable GNN

【CVPR 2022】基于可迁移GNN的自适应轨迹预测，Adaptive Trajectory Prediction via Transferable GNN

专知会员服务

47+阅读 · 2022年3月11日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

泡泡机器人SLAM

20+阅读 · 2019年4月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

通透性MR成像用于重度颅内动脉狭窄患者Ⅱ、Ⅲ级侧枝循环的定量评估

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-MALAT1在缺氧肺动脉平滑肌细胞增殖中的作用及分子机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

DRD2-Ca2+信号通路对PTSD所致学习记忆障碍的调控作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂路面条件下膝上假肢系统刚柔耦合动力学建模及顺应性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

喉嗓音显微手术模拟下手术动作控制、学习过程和学习曲线量化研究

国家自然科学基金

0+阅读 · 2012年12月31日

使用远红外及多环芳香烃辐射追溯红移为0,1,2处的星系恒星形成

国家自然科学基金

0+阅读 · 2012年12月31日

未校准环境下机器人自适应手眼视觉跟踪研究

国家自然科学基金

1+阅读 · 2012年12月31日

外界扰动下双足仿生机器人平衡控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Scalable Optimal Margin Distribution Machine

Arxiv

0+阅读 · 2023年6月5日

Geometry-Aware Coverage Path Planning for Depowdering on Complex 3D Surfaces

Arxiv

0+阅读 · 2023年6月5日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年6月3日

Graph Sparsification for GCN Towards Optimal Crop Yield Predictions

Graph Sparsification for GCN Towards Optimal Crop Yield Predictions

Arxiv

0+阅读 · 2023年6月2日

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

Arxiv

0+阅读 · 2023年6月2日

Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2023年6月2日

A Vitual-Force Based Swarm Algorithm for Balanced Circular Bin Packing Problems

Arxiv

0+阅读 · 2023年6月1日

Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity

Arxiv

0+阅读 · 2023年5月31日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

VIP会员

文章信息

相关主题

离线强化学习

相关VIP内容

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

强化学习在机器人中的应用，附视频与Slides，Animesh Garg, UoT

专知会员服务

37+阅读 · 2022年7月12日

【CVPR 2022】基于可迁移GNN的自适应轨迹预测，Adaptive Trajectory Prediction via Transferable GNN

【CVPR 2022】基于可迁移GNN的自适应轨迹预测，Adaptive Trajectory Prediction via Transferable GNN

专知会员服务

47+阅读 · 2022年3月11日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

【泡泡一分钟】学习行人如何导航：一种深度逆强化学习的方法

泡泡机器人SLAM

20+阅读 · 2019年4月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Scalable Optimal Margin Distribution Machine

Arxiv

0+阅读 · 2023年6月5日

Geometry-Aware Coverage Path Planning for Depowdering on Complex 3D Surfaces

Arxiv

0+阅读 · 2023年6月5日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年6月3日

Graph Sparsification for GCN Towards Optimal Crop Yield Predictions

Graph Sparsification for GCN Towards Optimal Crop Yield Predictions

Arxiv

0+阅读 · 2023年6月2日

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

Arxiv

0+阅读 · 2023年6月2日

Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2023年6月2日

A Vitual-Force Based Swarm Algorithm for Balanced Circular Bin Packing Problems

Arxiv

0+阅读 · 2023年6月1日

Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity

Arxiv

0+阅读 · 2023年5月31日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

相关基金

通透性MR成像用于重度颅内动脉狭窄患者Ⅱ、Ⅲ级侧枝循环的定量评估

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-MALAT1在缺氧肺动脉平滑肌细胞增殖中的作用及分子机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

DRD2-Ca2+信号通路对PTSD所致学习记忆障碍的调控作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂路面条件下膝上假肢系统刚柔耦合动力学建模及顺应性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

喉嗓音显微手术模拟下手术动作控制、学习过程和学习曲线量化研究

国家自然科学基金

0+阅读 · 2012年12月31日

使用远红外及多环芳香烃辐射追溯红移为0,1,2处的星系恒星形成

国家自然科学基金

0+阅读 · 2012年12月31日

未校准环境下机器人自适应手眼视觉跟踪研究

国家自然科学基金

1+阅读 · 2012年12月31日

外界扰动下双足仿生机器人平衡控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员