We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm with random initialization (73.2% average accuracy).
翻译:我们提出了一个从短的无标签轨迹中学习 Markov 链和 Markov 决定过程( MDPs) 混合物的算法。 具体地说, 我们的方法通过一个多步过程处理 Markov 链和可选控制输入的混合物, 包括:(1) 一个子空间估计步骤, (2) 使用“ 偏差距离估计器” 的轨迹的光谱聚合, 并使用 EM 算法进行精细化, (3) 一个模型估计步骤, 以及 (4) 用于预测新轨迹标签的分类步骤。 我们提供了端到端的性能保证, 我们只明确要求轨道长度在州数中线性长, 而轨径数在混合时间参数中线性。 实验结果支持了这些保证, 在电网世界的两个 MDP 混合物中,我们达到96.6%的平均精度, 以随机初始化( 平均精确度为73.2%) 。