Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:在药物靶点相互作用(DTI)领域聚合制药数据有潜力提供挽救生命的突破。然而,由于监管约束和商业利益,这通常是困难的。本文提出了联邦学习的应用,我们认为这种方法与行业的限制可以和谐共存,因为它不需要共享任何信息,揭示实体数据或其他高级别摘要信息。当被用于代表性的GraphDTA模型和KIBA数据集时,相对于最佳的非隐私保护替代方案,它可以达到15%的改进性能。我们广泛的实验显示,与其他领域不同的是,DTI数据集中的非IID数据分布不会降低FL的性能。此外,我们确定了增加新数据和增加更多客户端之间的成本和效益问题。