Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data has been extremely costly owing to the amount of human effort and expertise required. Therefore, material researchers are often reluctant to easily disclose their private data, which leads to the problem of data island, and it is difficult to collect a large amount of data to train high-quality models. In this study, a material microstructure image feature extraction algorithm FedTransfer based on data privacy protection is proposed. The core contributions are as follows: 1) the federated learning algorithm is introduced into the polycrystalline microstructure image segmentation task to make full use of different user data to carry out machine learning, break the data island and improve the model generalization ability under the condition of ensuring the privacy and security of user data; 2) A data sharing strategy based on style transfer is proposed. By sharing style information of images that is not urgent for user confidentiality, it can reduce the performance penalty caused by the distribution difference of data among different users.
翻译:材料数据挖掘的最近进展是由在大型数据集方面受过培训的高能力模型推动的,然而,由于人力投入和所需的专门知识,收集实验数据的费用非常高昂,因此,材料研究人员往往不愿意轻易披露其私人数据,这导致数据岛问题,而且难以收集大量数据来培训高质量的模型。在本研究中,提议在数据隐私保护的基础上,采用基于数据隐私保护的物质微观结构图像提取法Fed Transfer 。核心贡献如下:(1) 将联合学习算法引入聚晶线微型结构图像分割任务,以充分利用不同用户的数据进行机器学习,打破数据岛,提高模型的普及能力,同时确保用户数据的隐私和安全;(2) 以风格传输为基础的数据共享战略,通过分享对用户保密不迫切的图像的风格信息,可以减少不同用户之间数据分布差异造成的业绩处罚。