带有整数动作的软动动控点 (Soft Actor-Critic With Integer Actions) - 专知论文

会员服务 ·

0

再参数化/重参数化 · SOFT · SAC · 离散化 · Continuity ·

2021 年 9 月 17 日

Soft Actor-Critic With Integer Actions

翻译：带有整数动作的软动动控点

Ting-Han Fan,Yubo Wang

Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure can be simplified using their comparability property. Hence, the proposed integer reparameterization does not need one-hot encoding and is of low dimensionality. Experiments show that the proposed SAC under integer actions is as good as the continuous action version on robot control tasks and outperforms Proximal Policy Optimization on power distribution systems control tasks.

翻译：强化学习是在离散动作下进行的研究。整数动作设置在行业中很受欢迎, 但由于其高维度, 仍然具有挑战性。为此, 我们研究在整数动作下进行强化学习, 将软Acor- Critic( SAC) 算法与整数再校准法结合。我们对整数动作的关键观察是, 它们的离散结构可以使用其可比性属性进行简化。因此, 拟议的整数重新计量不需要一热编码, 并且是低维度的。实验显示, 拟议的SAC 整数下动作与机器人控制任务的持续操作版本和超效的配电系统控制任务Proximal政策优化相同。

0

相关内容

再参数化/重参数化

再参数化/重参数化

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

专知会员服务

28+阅读 · 2019年12月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Shape Programmable Magnetic Pixel Soft Robot

Shape Programmable Magnetic Pixel Soft Robot

Arxiv

0+阅读 · 2021年11月5日

Supervised Advantage Actor-Critic for Recommender Systems

Arxiv

1+阅读 · 2021年11月5日

Directed flow-augmentation

Arxiv

0+阅读 · 2021年11月5日

Modeling and Control of an Omnidirectional Micro Aerial Vehicle Equipped with a Soft Robotic Arm

Arxiv

0+阅读 · 2021年11月4日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年4月22日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年3月18日

VIP会员

文章信息

相关主题

再参数化/重参数化

相关VIP内容

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

专知会员服务

28+阅读 · 2019年12月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Shape Programmable Magnetic Pixel Soft Robot

Shape Programmable Magnetic Pixel Soft Robot

Arxiv

0+阅读 · 2021年11月5日

Supervised Advantage Actor-Critic for Recommender Systems

Arxiv

1+阅读 · 2021年11月5日

Directed flow-augmentation

Arxiv

0+阅读 · 2021年11月5日

Modeling and Control of an Omnidirectional Micro Aerial Vehicle Equipped with a Soft Robotic Arm

Arxiv

0+阅读 · 2021年11月4日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年4月22日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年3月18日

微信扫码咨询专知VIP会员