In cross-silo federated learning, clients (e.g., organizations) train a shared global model using local data. However, due to privacy concerns, the clients may not contribute enough data points during training. To address this issue, we propose a general incentive framework where the profit/benefit obtained from the global model can be appropriately allocated to clients to incentivize data contribution. We formulate the clients' interactions as a data contribution game and study its equilibrium. We characterize conditions for an equilibrium to exist, and prove that each client's equilibrium data contribution increases in its data quality and decreases in the privacy sensitivity. We further conduct experiments using CIFAR-10 and show that the results are consistent with the analysis. Moreover, we show that practical allocation mechanisms such as linearly proportional, leave-one-out, and Shapley-value incentivize more data contribution from clients with higher-quality data, in which leave-one-out tends to achieve the highest global model accuracy at equilibrium.
翻译:在跨空间联盟学习中,客户(例如各组织)利用当地数据培训一个共享的全球模型,然而,由于隐私问题,客户在培训期间可能没有提供足够的数据点;为解决这一问题,我们提议一个总体激励框架,将全球模型获得的利润/收益适当分配给客户,以激励数据贡献;我们将客户的互动作为一种数据贡献游戏,并研究其平衡;我们确定平衡存在的条件,并证明每个客户的均衡数据有助于提高数据质量和降低隐私敏感性;我们利用CIFAR-10进一步进行实验,并表明结果与分析一致;此外,我们表明,实际分配机制,如线性比例、请假一出局和 " 外溢值 " 等,能够激励质量较高的数据客户提供更多数据贡献,其中请假一出往往在平衡上达到全球最高模型准确度。