我的机器人能实现我的目标吗? 预测MDP政策能否达到用户指定的行为目标 (Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target) - 专知论文

会员服务 ·

0

求逆 · 估计/估计量 · Performer · Conformer · 边缘化 ·

2022 年 11 月 29 日

Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target

翻译：我的机器人能实现我的目标吗? 预测MDP政策能否达到用户指定的行为目标

Alexander Guyer,Thomas G. Dietterich

from arxiv, 12 pages, 4 figures. Appears in Proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)"

As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal. If that probability falls below some desired level, it should alert the user so that appropriate interventions can be made. This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary, such as the cumulative reward, measured at a fixed horizon $H$. At each time $t \in \{0, \ldots, H-1\}$, our method produces a calibrated estimate of the probability that the final cumulative reward will fall within a user-specified target interval $[y^-,y^+].$ Using this estimate, the autonomous system can raise an alarm if the probability drops below a specified threshold. We compute the probability estimates by inverting conformal prediction. Our starting point is the Conformalized Quantile Regression (CQR) method of Romano et al., which applies split-conformal prediction to the results of quantile regression. CQR is not invertible, but by using the conditional cumulative distribution function (CDF) as the non-conformity measure, we show how to obtain an invertible modification that we call \textbf{P}robability-space \textbf{C}onformalized \textbf{Q}uantile \textbf{R}egression (PCQR). Like CQR, PCQR produces well-calibrated conditional prediction intervals with finite-sample marginal guarantees. By inverting PCQR, we obtain marginal guarantees for the probability that the cumulative reward of an autonomous system will fall within an arbitrary user-specified target intervals. Experiments on two domains confirm that these probabilities are well-calibrated.

翻译：当一个自主系统执行任务时, 它应该保持一个校准的概率估计, 它将实现用户目标的概率。如果这一概率低于某种理想水平, 它应该提醒用户, 这样可以进行适当的干预。本文会考虑用户目标被指定为真实价值的性能摘要目标间隔的设置, 例如, 累积奖励, 在一个固定的地平线上测量 $H $。我们的起点是 Calgical Regrestition (CQR) 和 Al. 方法, 每一次都会对最终累积奖励在用户指定目标间隔 $[Y%-, y ⁇ ] 范围内的概率做出校正估计。使用这个估计, 自动系统可以提高一个提醒, 如果概率降低在一个指定的阈值阈值下。我们的预测中计算概率估计数。我们的起始点是 QQR), 将分解的数值预测用于二次曲线回归的结果。 CQR是不可逆的, 但是通过使用固定的累积性 Q- Qral- bliveral 计算一个不连续的递制的计算结果。

0

相关内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

母-胎界面RANKL调节滋养细胞生物学行为的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亨廷顿结合蛋白HYPB与脯氨酸丰富区的相互作用和调节机制

国家自然科学基金

0+阅读 · 2012年12月31日

TGR5对糖尿病肾脏纤维化的影响及其相关分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA对NFATc1/RANKL骨免疫信号通路的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多智能体不确定性系统的自适应一致性问题研究

国家自然科学基金

6+阅读 · 2012年12月31日

miR-206调控子宫内膜癌ERα的体内治疗实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

益气调血化痰方药干预AMD脉络膜新生血管VEGF/VEGFR的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

奶牛乳房炎金黄色葡萄球菌对乳腺上皮细胞Toll样受体信号通路影响的研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

An interface formulation for the Poisson equation in the presence of a semiconducting single-layer material

Arxiv

0+阅读 · 2023年1月31日

Selective inference for clustering with unknown variance

Arxiv

0+阅读 · 2023年1月30日

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年1月30日

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Arxiv

0+阅读 · 2023年1月30日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年1月29日

Thermodynamics of Internal Correlations

Arxiv

0+阅读 · 2023年1月29日

How Flexible is that Functional Form? Quantifying the Restrictiveness of Theories

Arxiv

0+阅读 · 2023年1月27日

Convergence of Batch Updating Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results

Arxiv

0+阅读 · 2023年1月27日

Feature Selection in High-dimensional Space Using Graph-Based Methods

Arxiv

0+阅读 · 2023年1月27日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰太空研究（2022-2024年） | 176页

新型军用战斗机无人机（MFUAV’s）| 2025最新80页

国防领域人工智能走向何方？

无人机对士兵的心理影响

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

An interface formulation for the Poisson equation in the presence of a semiconducting single-layer material

Arxiv

0+阅读 · 2023年1月31日

Selective inference for clustering with unknown variance

Arxiv

0+阅读 · 2023年1月30日

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年1月30日

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Arxiv

0+阅读 · 2023年1月30日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Arxiv

0+阅读 · 2023年1月29日

Thermodynamics of Internal Correlations

Arxiv

0+阅读 · 2023年1月29日

How Flexible is that Functional Form? Quantifying the Restrictiveness of Theories

Arxiv

0+阅读 · 2023年1月27日

Convergence of Batch Updating Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results

Arxiv

0+阅读 · 2023年1月27日

Feature Selection in High-dimensional Space Using Graph-Based Methods

Arxiv

0+阅读 · 2023年1月27日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

相关基金

母-胎界面RANKL调节滋养细胞生物学行为的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亨廷顿结合蛋白HYPB与脯氨酸丰富区的相互作用和调节机制

国家自然科学基金

0+阅读 · 2012年12月31日

TGR5对糖尿病肾脏纤维化的影响及其相关分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA对NFATc1/RANKL骨免疫信号通路的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多智能体不确定性系统的自适应一致性问题研究

国家自然科学基金

6+阅读 · 2012年12月31日

miR-206调控子宫内膜癌ERα的体内治疗实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

益气调血化痰方药干预AMD脉络膜新生血管VEGF/VEGFR的调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

奶牛乳房炎金黄色葡萄球菌对乳腺上皮细胞Toll样受体信号通路影响的研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员