Tensor factorization has received increasing interest due to its intrinsic ability to capture latent factors in multi-dimensional data with many applications such as recommender systems and Electronic Health Records (EHR) mining. PARAFAC2 and its variants have been proposed to address irregular tensors where one of the tensor modes is not aligned, e.g., different users in recommender systems or patients in EHRs may have different length of records. PARAFAC2 has been successfully applied on EHRs for extracting meaningful medical concepts (phenotypes). Despite recent advancements, current models' predictability and interpretability are not satisfactory, which limits its utility for downstream analysis. In this paper, we propose MULTIPAR: a supervised irregular tensor factorization with multi-task learning. MULTIPAR is flexible to incorporate both static (e.g. in-hospital mortality prediction) and continuous or dynamic (e.g. the need for ventilation) tasks. By supervising the tensor factorization with downstream prediction tasks and leveraging information from multiple related predictive tasks, MULTIPAR can yield not only more meaningful phenotypes but also better predictive performance for downstream tasks. We conduct extensive experiments on two real-world temporal EHR datasets to demonstrate that MULTIPAR is scalable and achieves better tensor fit with more meaningful subgroups and stronger predictive performance compared to existing state-of-the-art methods.
翻译:感官因子化已经受到越来越多的关注,因为其内在能力能够捕捉多维数据中的潜在因素,其应用包括建议系统和电子健康记录(EHR)采矿等许多应用程序。PARAFAC2 及其变体已经提出处理不协调的不规则发热器,例如,建议系统的不同用户或EHR的病人的记录长度可能不同。PARAFAC2 已经成功地应用到EHR,以提取有意义的医疗概念(人型)。尽管最近的进展,目前模型的可预测性和可解释性并不令人满意,这限制了其下游分析的效用。在本文件中,我们建议 MEPRAR:以多任务学习的方式监督不规则的发热因子化。 IMAPR灵活地将静态(如医院内死亡率预测)和连续或动态(如通风需要)的任务结合起来。通过对下游预测任务中的变因因素化和从多个相关的预测任务中获取信息,我们国际排雷行动评估方案不仅能产生更有意义的连字符型的特性,而且还能更好预测下游分析。