Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.
翻译:----
数据隐私和所有权在社交数据科学中至关重要,涉及法律和伦理问题。当不同的参与方拥有数据的不同部分时,数据共享和分析变得困难。一种解决这个问题的方法是在收集数据进行分析之前对数据应用去识别化或匿名化技术。然而,这可能会降低数据效用并增加再识别的风险。为了解决这些限制,我们提出了 PADME,这是一个分布式分析工具,它联合了模型实现和训练。PADME 使用联合方法,在所有参与方实施和部署模型,并逐步访问每个数据位置进行训练。这使得可以在不同的位置分析数据,同时仍然允许模型训练,就好像所有数据都在单个位置一样。在数据原始位置上对模型进行训练可以保留数据所有权。此外,在所有数据位置上的分析完成之前,不提供结果,以确保隐私并避免结果中的偏差。