在异质环境下与图形神经网络联分子质学 (Federated Learning of Molecular Properties with Graph Neural Networks in a Heterogeneous Setting)

Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while keeping the training data distributed over isolated clients. Due to the lack of related research, we first simulate a heterogeneous federated learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets for heterogeneously distributed client data. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

翻译：化学研究具有很高的物质和计算成本来进行实验。因此,各机构认为化学数据是有价值的,因此没有做出多少努力来为机器学习建立大型公共数据集。另一个挑战是,不同直觉对不同种类的分子感兴趣,产生不同的数据,而常规分布培训无法轻易结合这些数据。在这项工作中,我们引入了混合分子学习,以应对这些挑战。联邦学习使最终用户能够合作建立一个全球模型,同时保持由孤立客户传播的培训数据。由于缺乏相关的研究,我们首先通过联合进行不同类别分布客户数据的现有数据集的松散和潜在dirichlet分配,模拟一个多样化的联邦化化学数据库(FedChem)的混合学习基准(FedChem),我们关于FedChem的结果表明,在与不同客户的混合分子合作时,会出现重大的学习挑战。我们然后提出一个缓解问题的方法,即通过实例再连接(FLIT+) 。FLIT(+) 可以通过改进不确定样品的性能来调整不同客户的本地培训。我们在新的基准上进行的关于FedChem进行的全面实验,以降低新的化学合作方法的优势。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日