时间逻辑反事实经验重放</s> (Eventual Discounting Temporal Logic Counterfactual Experience Replay) - 专知论文

会员服务 ·

0

经验回放 · Continuity · 离散化 · 线性的 · 标量 ·

2023 年 3 月 3 日

Eventual Discounting Temporal Logic Counterfactual Experience Replay

翻译：时间逻辑反事实经验重放

Cameron Voloshin,Abhinav Verma,Yisong Yue

Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.

翻译：线性时间逻辑(LTL)提供了一种简化的方法,用于规定政策优化的任务,否则可能很难用卡路里奖赏功能来描述。然而,标准的RL框架可能过于短视,无法找到最大限度满足LTL的政策。本文做出了两项贡献。首先,我们开发了一个新的基于价值功能的代用工具,使用我们称之为最终折扣的方法,根据这种方法,人们可以找到符合LTL规格的政策,其可能性最高。第二,我们开发了一种新的经验重放方法,通过反事实推理,通过不同方式满足LTL规格,从政策推出中产生退出政策的数据。我们在离散和连续的州行动空间进行的实验证实了我们反事实重播方法的有效性。</s>

0

相关内容

经验回放

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

多雷达系统中概率假设密度滤波理论关键问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素反应性miR-130α通过靶向下调ERα参与子宫内膜癌恶性生物学行为

国家自然科学基金

0+阅读 · 2013年12月31日

高维复空间全纯自映射单参量连续半群及相关问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

MAWD/MAWBP复合体调节TGF-beta通路的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

ExCalibR: Expected Calibration of Recommendations

Arxiv

0+阅读 · 2023年4月24日

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年4月24日

Causal Inference under Temporal and Spatial Interference

Arxiv

0+阅读 · 2023年4月24日

Incorporating delayed entry into the joint frailty model for recurrent events and a terminal event

Arxiv

0+阅读 · 2023年4月23日

Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning

Arxiv

17+阅读 · 2022年2月17日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

39+阅读 · 2021年8月30日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

ExCalibR: Expected Calibration of Recommendations

Arxiv

0+阅读 · 2023年4月24日

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年4月24日

Causal Inference under Temporal and Spatial Interference

Arxiv

0+阅读 · 2023年4月24日

Incorporating delayed entry into the joint frailty model for recurrent events and a terminal event

Arxiv

0+阅读 · 2023年4月23日

Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning

Arxiv

17+阅读 · 2022年2月17日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Arxiv

39+阅读 · 2021年8月30日

Counterfactual Zero-Shot and Open-Set Visual Recognition

Arxiv

12+阅读 · 2021年3月1日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

相关基金

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

多雷达系统中概率假设密度滤波理论关键问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素反应性miR-130α通过靶向下调ERα参与子宫内膜癌恶性生物学行为

国家自然科学基金

0+阅读 · 2013年12月31日

高维复空间全纯自映射单参量连续半群及相关问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

MAWD/MAWBP复合体调节TGF-beta通路的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

非线性不连续系统的稳定与镇定

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员