The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models, as well as the first solution to decoding the optimal state transition sequence and the corresponding state dwelling time. We show that EM-based learning consists of two challenges: the estimation of posterior state probabilities and the computation of end-state conditioned statistics. We solve the first challenge by reformulating the estimation problem as an equivalent discrete time-inhomogeneous hidden Markov model. The second challenge is addressed by adapting three distinct approaches from the continuous time Markov chain (CTMC) literature to the CT-HMM domain. Additionally, we further improve the efficiency of the most efficient method by a factor of the number of states. Then, for decoding, we incorporate a state-of-the-art method from the (CTMC) literature, and extend the end-state conditioned optimal state sequence decoding to the CT-HMM case with the computation of the expected state dwelling time. We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer's disease dataset, and to decode and visualize the most probable state transition trajectory for individuals on the glaucoma dataset, which helps to identify progressing phenotypes in a comprehensive way. Finally, we apply the CT-HMM modeling and decoding strategy to investigate the progression of language acquisition and development.
翻译:连续时间隐藏 Markov 模型( CT- HMM) 是一个具有吸引力的模拟疾病进展的方法, 因为它能够描述不定期到达的噪音观测。 但是, CT- HMM 缺乏高效参数学习算法, 使得其使用仅限于非常小的模型, 或要求州过渡受到不切实际的限制。 在本文中, 我们首次完整地描述CT- HMM 模型( CT- HMM 模型) 高效的EM基学习方法, 以及解码最佳状态过渡序列和相应州居住时间的首个解决方案。 我们显示, EMCT 的学习包括两个挑战: 估计远地点的直观概率和计算最终状态条件统计数据。 我们解决第一个挑战的方法是将估算问题重新定位为相等的不切实际的时间隐藏 MarkMMMM 模型( CT CT ) 将连续时间链( CT ) 文学( CT CT ) 的三种截然不同的方法调整到 CT 域域域 。 此外, 我们进一步提高了州效率方法的效率, 我们运用了一个州际的数值变量模型, 将MMMMT 和 最接近的路径 数据 数据 数据 使用最精确的模型, 然后用一个状态数据 数据 数据 演示到最精确的模型, 我们使用一个状态数据 数据 数据 数据 演示到, 我们使用一个状态数据 数据 数据 演示到 数据 数据 运行到 和最快的周期的周期 数据 数据 数据 数据 数据 数据 运行到 数据 数据 数据, 运行到 运行到 数据 运行到 数据 运行到 数据 数据 运行到 运行到 运行到 数据 数据 运行到 数据 数据 数据 数据 数据 数据 数据 数据 数据 数据 数据 数据 和 数据 运行到 和 数据 到 数据 数据 数据 运行到 数据 数据 数据 数据 数据 数据 数据 数据 运行到 数据 数据 运行到 数据 数据 数据 数据 数据 数据 运行到 运行到 运行到 运行到 运行到 运行到 运行到 数据 运行到 运行到 运行到 数据 数据