Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenotyping. We develop a deep generative, continuous-time model of time-series data that clusters time-series while correcting for censorship time. We provide conditions under which clusters and the amount of delayed entry may be identified from data under a noiseless model.
翻译:未经监督的学习常常被用来发现数据中的群集,然而,不同种类的噪音可能会阻碍从现实世界的时间序列数据中发现有用的模式。在这项工作中,我们侧重于减轻间歇审查干扰疾病小说组群任务的干扰。我们开发了一种由时间序列组群、时间序列和时间序列数据组成的深层次的连续时间模型,在时间序列中进行分类,同时纠正审查时间。我们提供了根据无噪音模式从数据中识别群集和延迟输入数量的条件。