In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing to do so when guaranteed incentives, such as fairness and individual rationality. Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios. Due to the long processing time for data cleaning, difficulty in overcoming legal barriers, or unawareness, the parties may join the collaboration at different times. In this work, we propose the following perspective: As a party who joins earlier incurs higher risk and encourages the contribution from other wait-and-see parties, that party should receive a reward of higher value for sharing data earlier. To this end, we propose a fair and time-aware data sharing framework, including novel time-aware incentives. We develop new methods for deciding reward values to satisfy these incentives. We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets.
翻译:在协作式数据共享与机器学习中,多方汇集其数据资源以训练具有更佳性能的机器学习模型。然而,由于各方需承担数据收集成本,他们仅在获得公平性和个体理性等激励保证时,才愿意参与协作。现有框架假设所有参与方同时加入协作,但这在许多现实场景中并不成立。由于数据清洗耗时较长、难以克服法律障碍或信息不对称等原因,各方可能在不同时间加入协作。本工作提出以下观点:较早加入的参与方承担了更高风险,并鼓励了其他观望方的贡献,因此该方应因更早共享数据而获得更高价值的回报。为此,我们提出了一个公平且时间感知的数据共享框架,包含新颖的时间感知激励机制。我们开发了决定回报价值以满足这些激励的新方法。进一步阐述了如何生成实现该回报价值的模型奖励,并在合成与真实数据集上实证验证了所提方法的特性。