"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.
翻译:“我的数据价值是多少?” 是一个日益常见的由组织和个人共同提出的问题。 这个问题的答案可以允许在多个数据提供者之间公平分配利润,并在发生数据破损时确定可能的补偿。 在本文中,我们通过利用“Shapley 值”来研究数据估值问题。 Shapley 值是源于合作游戏理论的流行价值概念。 Shapley 值定义了一个独特的报酬方案,它满足了数据价值概念的许多偏差。 然而, Shapley 值往往需要指数化的时间来计算。 为了迎接这一挑战,我们建议用一个高效的算法汇编来估计“ Shapley ” 值的近似于“ Shapley 值 ” 。 我们还展示了各种基准数据集的每个培训案例的价值。</s>