Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenotyping. We develop a deep generative, continuous-time model of time-series data that clusters time-series while correcting for censorship time. We provide conditions under which clusters and the amount of delayed entry may be identified from data under a noiseless model. On synthetic data, we demonstrate accurate, stable, and interpretable results that outperform several benchmarks. On real-world clinical datasets of heart failure and Parkinson's disease patients, we study how interval censoring can adversely affect the task of disease phenotyping. Our model corrects for this source of error and recovers known clinical subtypes.
翻译:然而,不同种类的噪音可能阻碍从真实世界的时间序列数据中发现有用的模式。在这项工作中,我们注重减少间歇检查干扰疾病口腔组合的任务。我们开发了一个深度的基因化、连续时间的时间序列数据模型,这种时间序列数据可以按时间序列分组,同时纠正审查时间间隔。我们提供了根据无噪音模型从数据中识别集群和延迟输入数量的条件。在合成数据中,我们展示了准确、稳定和可解释的结果,这些结果超过了几个基准。在心脏衰竭和帕金森氏病患者的现实世界临床数据集中,我们研究了间检查如何对疾病口腔变化任务产生不利影响。我们为这一错误源和回收已知临床子类型提供了纠正模型。