We present an algorithm for use in learning mixtures of both Markov chains (MCs) and Markov decision processes (offline latent MDPs) from trajectories, with roots dating back to the work of Vempala and Wang. This amounts to handling Markov chains with optional control input. The method is modular in nature and amounts to (1) a subspace estimation step, (2) spectral clustering of trajectories, and (3) a few iterations of the EM algorithm. We provide end-to-end performance guarantees where we only explicitly require the number of trajectories to be linear in states and the trajectory length to be linear in mixing time. Experimental results suggest it outperforms both EM (95.4% on average) and a previous method by Gupta et al. (54.1%), obtaining 100% permuted accuracy on an 8x8 gridworld.
翻译:我们提出一种算法,用于从轨迹上学习Markov链(MCs)和Markov决定过程(脱线潜潜潜 MDPs)的混合物,其根部可追溯到Vempala和Wang的工作。这相当于用可选控制输入处理Markov链。这种方法是模块化的,相当于(1) 子空间估计步骤,(2) 轨迹的光谱组合,(3) EM算法的一些迭代。我们提供端到端的性能保证,只要我们明确要求各州的轨迹数量为线性,而轨迹长度在混合时间中为线性。实验结果显示它优于EM(平均为95.4%)和Gupta等人(平均为54.1%)以前采用的方法,在8x8的网格世界上获得100%的移动精度。