The missing data problem pervasively exists in statistical applications. Even as simple as the count data in mortality projections, it may not be available for certain age-and-year groups due to the budget limitations or difficulties in tracing research units, resulting in the follow-up estimation and prediction inaccuracies. To circumvent this data-driven challenge, we extend the Poisson log-normal Lee-Carter model to accommodate a more flexible time structure, and develop the new sampling algorithm that improves the MCMC convergence when dealing with incomplete mortality data. Via the overdispersion term and Gibbs sampler, the extended model can be re-written as the dynamic linear model so that both Kalman and sequential Kalman filters can be incorporated into the sampling scheme. Additionally, our meticulous prior settings can avoid the re-scaling step in each MCMC iteration, and allow model selection simultaneously conducted with estimation and prediction. The proposed method is applied to the mortality data of Chinese males during the period 1995-2016 to yield mortality rate forecasts for 2017-2039. The results are comparable to those based on the imputed data set, suggesting that our approach could handle incomplete data well.
翻译:缺少的数据问题在统计应用中普遍存在。即使与死亡率预测中的计数数据一样简单,由于预算限制或追踪研究单位方面的困难,某些年龄和年份组可能无法获得数据,导致后续估计和预测不准确。为避免这一数据驱动的挑战,我们推广Poisson log-smal Lee-Carter模型,以适应更灵活的时间结构,并开发新的抽样算法,使MCMC在处理不完全死亡率数据时的趋同得到改善。通过过度分散期和Gibbs取样器,扩展模型可以作为动态线性模型重新写成,以便Kalman和Kalman先后过滤器都能够纳入取样计划。此外,我们精心设计的以往环境可以避免每次MCMC校验中的重新缩放步骤,并允许与估计和预测同时进行模型选择。拟议方法适用于1995-2016年期间中国男性的死亡率数据,以得出2017-2039年的死亡率预测。结果可以与基于估算数据集的结果相比较,表明我们的方法可以很好地处理数据。