Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:在药物-目标互动(DTI)领域汇总药物数据有可能带来挽救生命的突破,然而,由于监管限制和商业利益,这极为困难。这项工作提议采用联合学习,我们认为,这种学习与该行业的制约因素是相容的,因为它并不要求分享任何能够显示实体数据或其他任何高层次数据摘要的信息。在用于具有代表性的GigapDTA模型和KIBA数据集时,相对于现有最佳非专利保护替代技术,其性能得到高达15%的改善。我们的大量实验表明,与其他领域不同的是,DTI数据集中非IID数据的分配不会削弱FL的性能。此外,我们确定在增加新数据的好处和增加更多客户的成本之间进行实质性的权衡。