Markov 链条和 MDP 学习混合体 (Learning Mixtures of Markov Chains and MDPs) - 专知论文

会员服务 ·

0

Markov · 马尔可夫链 · 估计/估计量 · Learning · 模型评估 ·

2023 年 1 月 28 日

Learning Mixtures of Markov Chains and MDPs

翻译：Markov 链条和 MDP 学习混合体

Chinmaya Kausik,Kevin Tan,Ambuj Tewari

from arxiv, 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments added

We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm with random initialization (73.2% average accuracy).

翻译：我们提出了一个从短的无标签轨迹中学习 Markov 链和 Markov 决定过程( MDPs) 混合物的算法。具体地说, 我们的方法通过一个多步过程处理 Markov 链和可选控制输入的混合物, 包括:(1) 一个子空间估计步骤, (2) 使用“ 偏差距离估计器” 的轨迹的光谱聚合, 并使用 EM 算法进行精细化, (3) 一个模型估计步骤, 以及 (4) 用于预测新轨迹标签的分类步骤。我们提供了端到端的性能保证, 我们只明确要求轨道长度在州数中线性长, 而轨径数在混合时间参数中线性。实验结果支持了这些保证, 在电网世界的两个 MDP 混合物中,我们达到96.6%的平均精度, 以随机初始化( 平均精确度为73.2%) 。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

利用贝叶斯方法估计LAMOST恒星参数

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

超导纳米线单光子响应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

城市快速路交通状态和动态OD矩阵的同步估计与合作控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

递推局部多项式回归估计及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

组织干细胞的神经保护机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

与玻色-爱因斯坦凝聚相关的确定与不确定系统孤立子的动力学行为

国家自然科学基金

0+阅读 · 2009年12月31日

金融资产变结构波动的非参数GARCH建模及其应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

Arxiv

0+阅读 · 2023年3月22日

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2023年3月20日

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Arxiv

0+阅读 · 2023年3月20日

Mixture of segmentation for heterogeneous functional data

Arxiv

0+阅读 · 2023年3月19日

Statistical Hardware Design With Multi-model Active Learning

Arxiv

0+阅读 · 2023年3月19日

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

Arxiv

0+阅读 · 2023年3月18日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

A Robustness Analysis of Blind Source Separation

Arxiv

0+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

马尔可夫链

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

Arxiv

0+阅读 · 2023年3月22日

Multi-armed Bandit Learning on a Graph

Multi-armed Bandit Learning on a Graph

Arxiv

0+阅读 · 2023年3月20日

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Arxiv

0+阅读 · 2023年3月20日

Mixture of segmentation for heterogeneous functional data

Arxiv

0+阅读 · 2023年3月19日

Statistical Hardware Design With Multi-model Active Learning

Arxiv

0+阅读 · 2023年3月19日

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

Arxiv

0+阅读 · 2023年3月18日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

A Robustness Analysis of Blind Source Separation

Arxiv

0+阅读 · 2023年3月17日

相关基金

利用贝叶斯方法估计LAMOST恒星参数

国家自然科学基金

2+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

超导纳米线单光子响应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

城市快速路交通状态和动态OD矩阵的同步估计与合作控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

递推局部多项式回归估计及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

组织干细胞的神经保护机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

与玻色-爱因斯坦凝聚相关的确定与不确定系统孤立子的动力学行为

国家自然科学基金

0+阅读 · 2009年12月31日

金融资产变结构波动的非参数GARCH建模及其应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员