使用Cecavity、Confexity和Lipschitz 属性为zs-POSG 提供 HSVI 的zs-POSG (HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties) - 专知论文

会员服务 ·

0

Lipschitz · 近似 · dynamic programming · state-of-the-art · 启发式算法 ·

2022 年 11 月 15 日

HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties

翻译：使用Cecavity、Confexity和Lipschitz 属性为zs-POSG 提供 HSVI 的zs-POSG

Aurélien Delage,Olivier Buffet,Jilles Dibangoye

from arxiv, 37 pages, 4 figures, 4 tables, 3 algorithms

Dynamic programming and heuristic search are at the core of state-of-the-art solvers for sequential decision-making problems. In partially observable or collaborative settings (\eg, POMDPs and Dec-POMDPs), this requires introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs-POSGs) as well, but failed in the general case despite known concavity and convexity properties, which only led to heuristic algorithms with poor convergence guarantees. We overcome this issue, leveraging on these properties to derive bounding approximators and efficient update and selection operators, before deriving a prototypical solver inspired by HSVI that provably converges to an $\epsilon$-optimal solution in finite time, and which we empirically evaluate. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

翻译：动态编程和超速搜索是连续决策问题最先进的解决方案的核心。在部分可见或协作的环境下( 如, POMDPs 和 Dec- POMDPs), 这需要引入适当的统计数据, 从而引发完全可见的问题, 以及最佳价值功能的捆绑( convex) 。这种方法在一些小类中取得了成功, 包括2Player零和部分可见的随机游戏( zs- POSGs), 但一般情况下却失败了, 尽管已知的混和性特性, 只导致超速算法, 且没有很好的趋同保证。我们克服了这一问题, 利用这些特性来获得匹配器和高效更新与选择操作器, 在产生由 HSVI 启发的半典型的解答器之前, 这在有限的时间里可以与 $\ perslon$- 最优的解决方案相匹配, 并且我们从经验上加以评估。这打开了一个充满希望的方法的大门。

0

相关内容

Lipschitz

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

白光LED用特殊价态离子掺杂红色荧光体的制备、结构调控及发光性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

非光滑 Lipschitz 连续函数优化束方法与应用

国家自然科学基金

0+阅读 · 2013年12月31日

平方本征函数对称与随机矩阵

国家自然科学基金

0+阅读 · 2013年12月31日

绵羊多羔性状主效基因BMP15，GDF9和BMPR-1B的定向SNPs、DNA甲基化及蛋白质互作机制

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

A total Lagrangian, objective and intrinsically locking-free Petrov-Galerkin SE(3) Cosserat rod finite element formulation

A total Lagrangian, objective and intrinsically locking-free Petrov-Galerkin SE(3) Cosserat rod finite element formulation

Arxiv

0+阅读 · 2023年1月13日

Generalization Properties of NAS under Activation and Skip Connection Search

Arxiv

0+阅读 · 2023年1月13日

SPDE bridges with observation noise and their spatial approximation

Arxiv

0+阅读 · 2023年1月13日

Extracting Medication Changes in Clinical Narratives using Pre-trained Language Models

Arxiv

0+阅读 · 2023年1月12日

Numerical Study of the Rate of Convergence of Chernoff Approximations to Solutions of the Heat Equation

Arxiv

0+阅读 · 2023年1月12日

Hierarchical Dynamic Masks for Visual Explanation of Neural Networks

Arxiv

0+阅读 · 2023年1月12日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

dynamic programming

state-of-the-art

启发式算法

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A total Lagrangian, objective and intrinsically locking-free Petrov-Galerkin SE(3) Cosserat rod finite element formulation

A total Lagrangian, objective and intrinsically locking-free Petrov-Galerkin SE(3) Cosserat rod finite element formulation

Arxiv

0+阅读 · 2023年1月13日

Generalization Properties of NAS under Activation and Skip Connection Search

Arxiv

0+阅读 · 2023年1月13日

SPDE bridges with observation noise and their spatial approximation

Arxiv

0+阅读 · 2023年1月13日

Extracting Medication Changes in Clinical Narratives using Pre-trained Language Models

Arxiv

0+阅读 · 2023年1月12日

Numerical Study of the Rate of Convergence of Chernoff Approximations to Solutions of the Heat Equation

Arxiv

0+阅读 · 2023年1月12日

Hierarchical Dynamic Masks for Visual Explanation of Neural Networks

Arxiv

0+阅读 · 2023年1月12日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

相关基金

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

白光LED用特殊价态离子掺杂红色荧光体的制备、结构调控及发光性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

非光滑 Lipschitz 连续函数优化束方法与应用

国家自然科学基金

0+阅读 · 2013年12月31日

平方本征函数对称与随机矩阵

国家自然科学基金

0+阅读 · 2013年12月31日

绵羊多羔性状主效基因BMP15，GDF9和BMPR-1B的定向SNPs、DNA甲基化及蛋白质互作机制

国家自然科学基金

0+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Rydberg Blockade条件下的量子相干与量子信息处理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员