Given the abundance and ease of access of personal data today, individual privacy has become of paramount importance, particularly in the healthcare domain. In this work, we aim to utilise patient data extracted from multiple hospital data centres to train a machine learning model without sacrificing patient privacy. We develop a scheduling algorithm in conjunction with a student-teacher algorithm that is deployed in a federated manner. This allows a central model to learn from batches of data at each federal node. The teacher acts between data centres to update the main task (student) algorithm using the data that is stored in the various data centres. We show that the scheduler, trained using meta-gradients, can effectively organise training and as a result train a machine learning model on a diverse dataset without needing explicit access to the patient data. We achieve state-of-the-art performance and show how our method overcomes some of the problems faced in the federated learning such as node poisoning. We further show how the scheduler can be used as a mechanism for transfer learning, allowing different teachers to work together in training a student for state-of-the-art performance.
翻译:鉴于当今个人数据的丰富性和容易获取,个人隐私已变得至关重要,特别是在医疗保健领域。在这项工作中,我们的目标是利用从多个医院数据中心提取的病人数据,在不牺牲病人隐私的情况下培训机器学习模式;我们与以联合方式部署的学生-教师算法一起开发一个时间安排算法,这样就可以从联邦每个节点的一批数据中学习一个中央模型;教师在数据中心之间采取行动,利用各数据中心储存的数据更新主要任务(学生)算法。我们表明,使用元分位制培训的调度员可以有效地组织培训,从而在不需明确访问病人数据的情况下,就多种数据集培训机器学习模式。我们取得最新业绩,并展示我们的方法如何克服在联邦学习中面临的一些问题,例如节点中毒。我们进一步展示了如何将计程器用作转移学习的机制,让不同的教师一起培训学生,以便进行状态的学习。