以集群为基础对过渡独立的多方DP的控制 (Cluster-Based Control of Transition-Independent MDPs) - 专知论文

会员服务 ·

0

簇 · 优化器 · 控制器 · 值迭代 · 分离的 ·

2022 年 11 月 10 日

Cluster-Based Control of Transition-Independent MDPs

翻译：以集群为基础对过渡独立的多方DP的控制

Carmel Fiscko,Soummya Kar,Bruno Sinopoli

from arxiv, 25 pages, 3 figures

This work studies efficient solution methods for cluster-based control of transition-independent Markov decision processes (TI-MDPs). We focus on the application of control of multi-agent systems, whereby a central planner influences agents to select target joint strategies. Under mild assumptions, this can be modeled as a TI-MDP where agents are partitioned into disjoint clusters such that each cluster can receive a unique control. To efficiently find a policy in the exponentially expanded action space, we present a clustered Bellman operator that optimizes over the action space for one cluster at any evaluation. We present Clustered Value Iteration (CVI), which uses this operator to iteratively perform round robin optimization across the clusters. CVI converges exponentially faster than standard value iteration (VI), and can find policies that closely approximate the MDP's true optimal value. A special class of TI-MDPs with separable reward functions are investigated, and it is shown that CVI will find optimal policies. Finally, the optimal clustering assignment problem is explored. The value functions of separable TI-MDPs are shown to be submodular functions, and notions of submodularity are used to analyze an iterative greedy cluster splitting algorithm. The values of this clustering technique are shown to form a monotonic, submodular lower bound of the values of the optimal clustering assignment. Finally, these control ideas are demonstrated on simulated examples.

翻译：这项工作研究基于集群的控制过渡独立的Markov 决策流程(TI-MDPs)的高效解决方案方法。我们侧重于多试剂系统控制的应用,即中央规划员影响代理商选择目标联合战略。在轻度假设下,这可以建为TI-MDP,代理商被分割成不相连的集群,以便每个集群都能得到独特的控制。为了在指数扩张的行动空间中有效地找到一项政策,我们展示了一个分组的贝尔曼操作员,在任何评价中优化一个组的操作空间。我们展示了组合值透化(CVI),利用该操作员在各组之间迭接地执行圆盘优化。CVI比标准值转换(VI)快得多,并可以找到与MDP真正最佳价值相近的政策。调查了一组具有分辨性奖励功能的特殊类别,并表明CVI将找到最佳的政策。最后,探索了最佳组合分配问题。我们展示了分立的TI-MDPs的值功能,在集群各组之间反复进行轮盘优化的优化优化优化优化。 CVI的值功能被显示为次模式组合组合式组合式组合式组合式的模型模型模型,其最终分析。展示了该组合式组合式组合式组合式的模型的模型的模型的模型的模型模式的模型的模型,其结构结构结构结构结构结构结构模型被展示了。

0

相关内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IMP3调控上皮间质转化和肿瘤干细胞进而参与结肠癌发生和转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Fur调控霍乱弧菌生物膜形成和TCP合成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

从巨噬细胞中LXR-CCR7交互作用探讨丹参素抗动脉粥样硬化机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

海洋黄杆菌产生的新金丝菌素类抗生素BF-1抗真菌机制和作用靶标研究

国家自然科学基金

0+阅读 · 2009年12月31日

Falsification of Learning-Based Controllers through Multi-Fidelity Bayesian Optimization

Arxiv

0+阅读 · 2023年1月4日

Neural Message Passing for Objective-Based Uncertainty Quantification and Optimal Experimental Design

Arxiv

0+阅读 · 2023年1月4日

Covariate-guided Bayesian mixture model for multivariate time series

Arxiv

0+阅读 · 2023年1月3日

Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control

Arxiv

0+阅读 · 2023年1月3日

Empirical Bayes Selection for Value Maximization

Arxiv

0+阅读 · 2023年1月3日

Inspecting differences between multivariate distributions: graphical tool-kit and related tests

Arxiv

0+阅读 · 2023年1月3日

Risk-Averse MDPs under Reward Ambiguity

Arxiv

0+阅读 · 2023年1月3日

State and parameter learning with PaRIS particle Gibbs

Arxiv

0+阅读 · 2023年1月2日

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Arxiv

0+阅读 · 2023年1月2日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Falsification of Learning-Based Controllers through Multi-Fidelity Bayesian Optimization

Arxiv

0+阅读 · 2023年1月4日

Neural Message Passing for Objective-Based Uncertainty Quantification and Optimal Experimental Design

Arxiv

0+阅读 · 2023年1月4日

Covariate-guided Bayesian mixture model for multivariate time series

Arxiv

0+阅读 · 2023年1月3日

Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control

Arxiv

0+阅读 · 2023年1月3日

Empirical Bayes Selection for Value Maximization

Arxiv

0+阅读 · 2023年1月3日

Inspecting differences between multivariate distributions: graphical tool-kit and related tests

Arxiv

0+阅读 · 2023年1月3日

Risk-Averse MDPs under Reward Ambiguity

Arxiv

0+阅读 · 2023年1月3日

State and parameter learning with PaRIS particle Gibbs

Arxiv

0+阅读 · 2023年1月2日

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Arxiv

0+阅读 · 2023年1月2日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IMP3调控上皮间质转化和肿瘤干细胞进而参与结肠癌发生和转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Fur调控霍乱弧菌生物膜形成和TCP合成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

从巨噬细胞中LXR-CCR7交互作用探讨丹参素抗动脉粥样硬化机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

海洋黄杆菌产生的新金丝菌素类抗生素BF-1抗真菌机制和作用靶标研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员