We consider missing data in the context of hidden Markov models with a focus on situations where data is missing not at random (MNAR) and missingness depends on the identity of the hidden states. In simulations, we show that including a submodel for state-dependent missingness reduces bias when data is MNAR and state-dependent, whilst not reducing accuracy when data is missing at random (MAR). When missingness depends on time but not the hidden states, a model which only allows for state-dependent missingness is biased, whilst a model that allows for both state- and time-dependent missingness is not. Overall, these results show that modelling missingness as state-dependent, and including other relevant covariates, is a useful strategy in applications of hidden Markov models to time-series with missing data. We conclude with an application of the state- and time-dependent MNAR hidden Markov model to a real dataset, involving severity of schizophrenic symptoms in a clinical trial.
翻译:我们从隐藏的Markov模型中考虑缺失的数据,重点是数据并非随机丢失的情况(MNAR),而数据缺失取决于隐藏状态的身份。在模拟中,我们显示,如果数据为 MNAR 和状态依赖,那么包含一个基于国家缺失的子模型会减少数据偏差,而当数据随机丢失时不会降低准确性(MAR ) 。 当缺失取决于时间而不是隐藏状态时,一个仅允许国家依赖的缺失的模式是有偏向的,而一个允许州和时间依赖的缺失的模式则不是。 总体而言,这些结果表明,作为依赖国家(包括其他相关变量)的模拟失踪模式是将隐藏的Markov模型应用于缺少数据的时间序列的有用战略。 我们最后将基于国家和时间的MNAR隐藏的Markov模型应用于一个真实的数据集,在临床试验中涉及到精神分裂症状的严重性。