PADME-SoSci: 用于社会科学分析和分布式机器学习的平台 (PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences) - 专知论文

会员服务 ·

0

分析 · 分布式机器学习 · 匿名化技术 · 匿名化 · 分析工具 ·

2023 年 4 月 3 日

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences

翻译：PADME-SoSci: 用于社会科学分析和分布式机器学习的平台

Zeyd Boukhers,Arnim Bleier,Yeliz Ucer Yediel,Mio Hienstorfer-Heitmann,Mehrshad Jaberansary,Adamantios Koumpis,Oya Beyan

from arxiv, accepted to be published @ ACM/IEEE JCDL 2023 - Joint Conference on Digital Libraries

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

翻译：数据隐私和所有权在社交数据科学中至关重要，引发了法律和伦理方面的担忧。当不同方拥有不同部分的数据时，共享和分析数据十分困难。解决这个挑战的方法之一是在收集数据进行分析之前，对数据应用去标识化或匿名化技术。然而，这样做可能会降低数据效用，增加再识别的风险。为了解决这些局限，我们提出了PADME，这是一个分布式分析工具，用于联邦模型实现和训练。PADME采用联邦方法，即每个参与方都实现和部署模型，并逐渐访问每个数据位置进行训练。这使得可以在不同位置的数据上进行分析，同时仍允许模型在单个位置上训练，就像所有数据都在一个位置上一样。在原始位置上的数据培训模型保持数据所有权。此外，结果在完成所有数据位置的分析之前不会得到提供，以确保隐私并避免结果中的偏见。

0

相关内容

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

82+阅读 · 2022年8月6日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

121+阅读 · 2022年3月20日

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

专知会员服务

29+阅读 · 2022年2月17日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

小客车摇号政策的福利及行为影响分析：以北京为例

国家自然科学基金

1+阅读 · 2013年12月31日

PVT-AW-PCES集成系统耦合运行机理与特性规律研究

国家自然科学基金

0+阅读 · 2013年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放的多级供应链优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

土壤酸度与土壤表面电化学性质之间的互馈关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于GPS浮动车数据的城市道路交通信息提取与分析

国家自然科学基金

0+阅读 · 2008年12月31日

Machine Learning for Synthetic Data Generation: A Review

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers

Arxiv

0+阅读 · 2023年5月19日

PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy

Arxiv

0+阅读 · 2023年5月19日

Free Lunch for Privacy Preserving Distributed Graph Learning

Arxiv

0+阅读 · 2023年5月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

36+阅读 · 2021年8月2日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

分布式机器学习

匿名化技术

相关VIP内容

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【2022新书】Python数据科学导论，309页pdf

【2022新书】Python数据科学导论，309页pdf

专知会员服务

82+阅读 · 2022年8月6日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

【Manning2022新书】Python与PySpark的数据分析，458页pdf，Data Analysis with Python and PySpark

专知会员服务

121+阅读 · 2022年3月20日

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

【人工智能+人力资源】人力资源专业人士的工具箱，Human-Centred Artificial Intelligence for Human Resources: A Toolkit for Human Resources Professionals

专知会员服务

29+阅读 · 2022年2月17日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

机器学习与物理科学（Machine learning and the physical sciences），附44页pdf

专知会员服务

51+阅读 · 2019年12月10日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Machine Learning for Synthetic Data Generation: A Review

Arxiv

0+阅读 · 2023年5月23日

A first look into the carbon footprint of federated learning

Arxiv

0+阅读 · 2023年5月22日

Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers

Arxiv

0+阅读 · 2023年5月19日

PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy

Arxiv

0+阅读 · 2023年5月19日

Free Lunch for Privacy Preserving Distributed Graph Learning

Arxiv

0+阅读 · 2023年5月19日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

36+阅读 · 2021年8月2日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

若干新型车间作业排序问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

小客车摇号政策的福利及行为影响分析：以北京为例

国家自然科学基金

1+阅读 · 2013年12月31日

PVT-AW-PCES集成系统耦合运行机理与特性规律研究

国家自然科学基金

0+阅读 · 2013年12月31日

高能物理数据分析的Hadoop/HBASE平台研究

国家自然科学基金

1+阅读 · 2012年12月31日

真空管道高速系统热压耦合生热规律及能耗研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放的多级供应链优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

土壤酸度与土壤表面电化学性质之间的互馈关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于GPS浮动车数据的城市道路交通信息提取与分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员