The increasing take-up of machine learning techniques requires ever-more application-specific training data. Manually collecting such training data is time-consuming and error-prone process. Data marketplaces represent a compelling alternative, providing an easy way for acquiring data from potential data providers. A key component of such marketplaces is the compensation mechanism for data providers. Classic payoff-allocation methods, such as the Shapley value, can be vulnerable to data-replication attacks, and are infeasible to compute in the absence of efficient approximation algorithms. To address these challenges, we present an extensive theoretical study on the vulnerabilities of game theoretic payoff-allocation schemes to replication attacks. Our insights apply to a wide range of payoff-allocation schemes, and enable the design of customised replication-robust payoff-allocations. Furthermore, we present a novel efficient sampling algorithm for approximating payoff-allocation schemes based on marginal contributions. In our experiments, we validate the replication-robustness of classic payoff-allocation schemes and new payoff-allocation schemes derived from our theoretical insights. We also demonstrate the efficiency of our proposed sampling algorithm on a wide range of machine learning tasks.
翻译:机械学习技术的日益采用需要更多具体应用的培训数据。手工收集这种培训数据是一个耗时和容易出错的过程。数据市场是一个令人信服的替代办法,为从潜在的数据提供者获取数据提供了方便的途径。这种市场的一个关键组成部分是数据提供者的补偿机制。典型的付款分配方法,如Shapley值,可能易受数据复制攻击,并且无法在缺乏高效近似算法的情况下进行计算。为了应对这些挑战,我们提出了关于游戏理论性报酬分配办法的脆弱性的广泛理论研究,以便复制攻击。我们的见解适用于广泛的支付性分配办法,并能够设计定制化的复制-机器人报酬分配办法。此外,我们提出了基于边际贡献的接近性报酬分配办法的新的有效抽样算法。我们在实验中,验证了典型的支付性分配办法和根据我们理论见解得出的新的支付性分配办法的复制-破坏性。我们还展示了我们提议的抽样算法在理论性学习任务方面的效率。