POMCP 软政策指导学习逻辑规格</s> (Learning Logic Specifications for Soft Policy Guidance in POMCP) - 专知论文

会员服务 ·

0

蒙特卡罗 · Integration · Learning · SOFT · Guidance ·

2023 年 3 月 16 日

Learning Logic Specifications for Soft Policy Guidance in POMCP

翻译：POMCP 软政策指导学习逻辑规格

Giulio Mazzi,Daniele Meli,Alberto Castellini,Alessandro Farinelli

from arxiv, To appear in the Proceedings of 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2023

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a Monte Carlo Tree Search based strategy. However, POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached, particularly in environments with large state spaces and long horizons. Recently, logic specifications have been integrated into POMCP to guide exploration and to satisfy safety requirements. However, such policy-related rules require manual definition by domain experts, especially in real-world scenarios. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions, i.e., sets of belief-action pairs generated by the planner. Specifically, we learn rules expressed in the paradigm of answer set programming. We then integrate them inside POMCP to provide soft policy bias toward promising actions. In the context of two benchmark scenarios, rocksample and battery, we show that the integration of learned rules from small task instances can improve performance with fewer Monte Carlo simulations and in larger task instances. We make our modified version of POMCP publicly available at https://github.com/GiuMaz/pomcp_clingo.git.

翻译：部分可观察的蒙特卡洛规划(POMCP)是部分可观察的Markov决策程序(POMDPs)的一个高效解决方案,它通过使用蒙特卡洛树搜索战略,在本地和网上计算最佳政策的近似近似度,可以向大州空间扩展,但POMCP却有微弱的奖励功能,即:只有当最终目标达到时,特别是在国家空间和长视野大的环境中,才能获得奖励;最近,逻辑规格已经纳入POMCP,以指导勘探和满足安全要求;然而,这种与政策有关的规则需要域专家手工界定,特别是在现实世界情景中。在本文中,我们使用直观逻辑编程序,从POMCP处决的痕迹中学习逻辑规范,即规划者产生的几套信仰-行动配对。具体地说,我们学习了在回答程序范式中表达的规则。我们随后将其纳入POMCP,以便为有前途的行动提供软的政策偏向。在两种基准情景、岩石和电池中,我们表明从小任务实例中学习的规则的整合可以改进我们MLAGM/GOM的公开模拟和任务实例。</s>

0

相关内容

蒙特卡罗

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

CyberKnife剂量校准与验证的系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

转录共激活因子RBM14(CoAA)在模式生物斑马鱼神经发育中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

订单装配型供应链环境下的个性化产品配置及优化建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

胚胎干细胞多潜能相关转录因子Zscan10的转录调控分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

Truncating Trajectories in Monte Carlo Reinforcement Learning

Arxiv

0+阅读 · 2023年5月7日

Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Arxiv

0+阅读 · 2023年5月7日

Large Language Models for Code: Security Hardening and Adversarial Testing

Arxiv

0+阅读 · 2023年5月5日

Maximizing Submodular Functions for Recommendation in the Presence of Biases

Arxiv

0+阅读 · 2023年5月3日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Toward Connecting Speech Acts and Search Actions in Conversational Search Tasks

Arxiv

0+阅读 · 2023年5月8日

Truncating Trajectories in Monte Carlo Reinforcement Learning

Arxiv

0+阅读 · 2023年5月7日

Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Arxiv

0+阅读 · 2023年5月7日

Large Language Models for Code: Security Hardening and Adversarial Testing

Arxiv

0+阅读 · 2023年5月5日

Maximizing Submodular Functions for Recommendation in the Presence of Biases

Arxiv

0+阅读 · 2023年5月3日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

CyberKnife剂量校准与验证的系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

转录共激活因子RBM14(CoAA)在模式生物斑马鱼神经发育中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

订单装配型供应链环境下的个性化产品配置及优化建模研究

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

胚胎干细胞多潜能相关转录因子Zscan10的转录调控分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员