具有最大概率满意度的强化学习基于时间逻辑控制 (Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction) - 专知论文

会员服务 ·

0

优化器 · 控制器 · 折扣回报 · 强化学习 · 学成 ·

2021 年 10 月 5 日

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

翻译：具有最大概率满意度的强化学习基于时间逻辑控制

Mingyu Cai,Shaoping Xiao,Baoluo Li,Zhiliang Li,Zhen Kan

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.

翻译：本文介绍了一种无模型强化学习(RL)算法,以综合一种控制政策,最大限度地提高线性时间逻辑(LTL)规格的满意度。由于对环境和运动不确定性的考虑,我们将机器人运动模拟为具有未知过渡概率和未知概率标签功能的隐性标记Markov决定过程,该过程具有未知过渡概率和未知概率标签功能。LTL任务规格转换为限制确定性通用B\\"uchi automaton(LDGBA),有几套接受的组合,以在学习期间保持密集的回报。应用LDGBA的新做法是通过设计一个同步跟踪前沿功能来构建一个嵌入式LDGBA(E-LDGBA),从而能够在不增加尺寸和计算复杂性的情况下记录非访问接收组合。在适当的依赖性奖励和折扣功能下,严格分析表明,任何优化基于RL方法的预期贴现回报的方法都得到保证找到最佳政策,从而最大限度地提高LTL规格的满意度。基于模型的动作规划战略是设计出一个无模型的移动式移动规划战略,以产生最佳政策,通过模拟模型进行模拟合成。

0

相关内容

优化器

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Pessimistic Model Selection for Offline Deep Reinforcement Learning

Arxiv

1+阅读 · 2021年11月29日

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Arxiv

0+阅读 · 2021年11月28日

Farsighted Probabilistic Sampling based Local Search for (Weighted) Partial MaxSAT

Arxiv

0+阅读 · 2021年11月28日

Reinforcement Explanation Learning

Arxiv

0+阅读 · 2021年11月26日

Optimal Probabilistic Motion Planning with Potential Infeasible LTL Constraints

Arxiv

0+阅读 · 2021年11月25日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Learning Blind Video Temporal Consistency

Learning Blind Video Temporal Consistency

Arxiv

3+阅读 · 2018年8月1日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多传感器融合感知综述：背景、方法、挑战与前景

《基于深度学习模型的图像军事目标检测》

TKDE | 推荐系统鲁棒性全面综述及鲁棒性评测库

中文版 | 战场创新：以色列-伊朗与俄罗斯-乌克兰战场如何重塑现代战争

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Pessimistic Model Selection for Offline Deep Reinforcement Learning

Arxiv

1+阅读 · 2021年11月29日

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Arxiv

0+阅读 · 2021年11月28日

Farsighted Probabilistic Sampling based Local Search for (Weighted) Partial MaxSAT

Arxiv

0+阅读 · 2021年11月28日

Reinforcement Explanation Learning

Arxiv

0+阅读 · 2021年11月26日

Optimal Probabilistic Motion Planning with Potential Infeasible LTL Constraints

Arxiv

0+阅读 · 2021年11月25日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Learning Blind Video Temporal Consistency

Learning Blind Video Temporal Consistency

Arxiv

3+阅读 · 2018年8月1日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

微信扫码咨询专知VIP会员