On Many-Actions Policy Gradient - 专知论文

会员服务 ·

0

方差 · 有偏 · 样本 · 随机性策略 · Continuity ·

2023 年 5 月 11 日

On Many-Actions Policy Gradient

翻译：暂无翻译

Michal Nauman,Marek Cygan

from arxiv, ICML 2023

We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

翻译：暂无翻译

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】预测原理与实战，Forecasting: Principles & Practice

【干货书】预测原理与实战，Forecasting: Principles & Practice

专知会员服务

96+阅读 · 2022年4月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

参数化儿童人体模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-LSINCT5在胃癌中的生物学功能及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

人血管内皮细胞登革病毒候选受体- - 55 kDa蛋白的鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

二氧化碳泡沫液在多孔介质内的渗流特性及渗流机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

膝骨性关节炎的近红外光学无创检测

国家自然科学基金

0+阅读 · 2011年12月31日

胆囊癌细胞亚群的克隆分选及异质性分析

国家自然科学基金

0+阅读 · 2009年12月31日

K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs

Arxiv

0+阅读 · 2023年6月26日

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Arxiv

0+阅读 · 2023年6月26日

Histopathology Image Classification using Deep Manifold Contrastive Learning

Arxiv

0+阅读 · 2023年6月26日

Active Coverage for PAC Reinforcement Learning

Arxiv

0+阅读 · 2023年6月23日

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

Arxiv

0+阅读 · 2023年6月23日

Multi-objective optimization based network control principles for identifying personalized drug targets with cancer

Arxiv

0+阅读 · 2023年6月23日

Correcting discount-factor mismatch in on-policy policy gradient methods

Arxiv

0+阅读 · 2023年6月23日

Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年6月22日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

随机性策略

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】预测原理与实战，Forecasting: Principles & Practice

【干货书】预测原理与实战，Forecasting: Principles & Practice

专知会员服务

96+阅读 · 2022年4月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs

Arxiv

0+阅读 · 2023年6月26日

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Arxiv

0+阅读 · 2023年6月26日

Histopathology Image Classification using Deep Manifold Contrastive Learning

Arxiv

0+阅读 · 2023年6月26日

Active Coverage for PAC Reinforcement Learning

Arxiv

0+阅读 · 2023年6月23日

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

Arxiv

0+阅读 · 2023年6月23日

Multi-objective optimization based network control principles for identifying personalized drug targets with cancer

Arxiv

0+阅读 · 2023年6月23日

Correcting discount-factor mismatch in on-policy policy gradient methods

Arxiv

0+阅读 · 2023年6月23日

Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年6月22日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

参数化儿童人体模型建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-LSINCT5在胃癌中的生物学功能及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

人血管内皮细胞登革病毒候选受体- - 55 kDa蛋白的鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

二氧化碳泡沫液在多孔介质内的渗流特性及渗流机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

膝骨性关节炎的近红外光学无创检测

国家自然科学基金

0+阅读 · 2011年12月31日

胆囊癌细胞亚群的克隆分选及异质性分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员