Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients' history of medical procedures, medications, diagnosis, lab tests, etc., are converted to meaningful and interpretable medical concepts. Federated tensor factorization distributes the tensor computation to multiple workers under the coordination of a central server, which enables jointly learning the phenotypes across multiple hospitals while preserving the privacy of the patient information. However, existing federated tensor factorization algorithms encounter the single-point-failure issue with the involvement of the central server, which is not only easily exposed to external attacks but also limits the number of clients sharing information with the server under restricted uplink bandwidth. In this paper, we propose CiderTF, a communication-efficient decentralized generalized tensor factorization, which reduces the uplink communication cost by leveraging a four-level communication reduction strategy designed for a generalized tensor factorization, which has the flexibility of modeling different tensor distribution with multiple kinds of loss functions. Experiments on two real-world EHR datasets demonstrate that CiderTF achieves comparable convergence with a communication reduction up to 99.99%.
翻译:热量因子化已被证明是一种有效的、不受监督的保健数据分析学习方法,特别是计算功能分析方法,在计算功能中,具有患者医疗程序、药物、诊断、实验室测试等历史的高级电子健康记录(EHRs)被转换为有意义和可解释的医疗概念。在中央服务器的协调下,联度因子化将电压计算方法分配给多个工人,通过中央服务器的协调,使多个医院能够共同学习苯菌型,同时保护患者信息的隐私。然而,现有的联产式发配配配方算法在中央服务器参与下遇到单点故障问题,中央服务器不仅很容易暴露于外部攻击、药物、诊断、实验室测试等,而且限制客户在限制的上链带带下与服务器分享信息的数量。在本文件中,我们建议CiderTF, 一种具有分散式分散式的通信效率的通用电解因子化,通过利用为普遍发压因子化设计的四级通信减压战略,降低通信成本,从而降低通信的连结性。但现有的调控号算法具有以多种损失功能模拟不同色质分布的灵活性。