Data privacy and sharing has always been a critical issue when trying to build complex deep learning-based systems to model data. Facilitation of a decentralized approach that could take benefit from data across multiple nodes while not needing to merge their data contents physically has been an area of active research. In this paper, we present a solution to benefit from a distributed data setup in the case of training deep learning architectures by making use of a smart contract system. Specifically, we propose a mechanism that aggregates together the intermediate representations obtained from local ANN models over a blockchain. Training of local models takes place on their respective data. The intermediate representations derived from them, when combined and trained together on the host node, helps to get a more accurate system. While federated learning primarily deals with the same features of data where the number of samples being distributed on multiple nodes, here we are dealing with the same number of samples but with their features being distributed on multiple nodes. We consider the task of bank loan prediction wherein the personal details of an individual and their bank-specific details may not be available at the same place. Our aggregation mechanism helps to train a model on such existing distributed data without having to share and concatenate together the actual data values. The obtained performance, which is better than that of individual nodes, and is at par with that of a centralized data setup makes a strong case for extending our technique across other architectures and tasks. The solution finds its application in organizations that want to train deep learning models on vertically partitioned data.
翻译:在试图建立复杂的深层学习基础系统以模拟数据时,数据隐私和共享始终是一个关键问题。促进分散化办法,从多个节点的数据中受益,而无需将数据内容实际合并,这是积极研究的一个领域。在本文件中,我们提出了一个解决办法,以便在培训深层学习结构时,通过使用智能合同系统,从分布式数据结构中受益。具体地说,我们提议一个机制,将从本地ANN模型获得的中间代表处汇集到一个块链中,对当地模型进行培训,根据它们各自的数据进行培训。由它们产生的中间代表机构,如果在主机的垂直节点上进行合并和培训,则有助于获得更准确的系统。尽管联合学习主要涉及数据特征的相同特点,即通过多个节点分配样本的数量,我们处理数据结构的特征。我们考虑的是银行贷款预测任务,即个人个人的个人细节及其银行具体细节可能无法在同一地点提供。我们的组合机制有助于在这种分布式数据模型上培训模型,而现有分布式数据在不共享和配置式中,其实际数据结构中,其实际和组合式数据应用比实际数据结构更精确。