In many applications, an organization may want to acquire data from many data owners. Data marketplaces allow data owners to produce data assemblage needed by data buyers through coalition. To encourage coalitions to produce data, it is critical to allocate revenue to data owners in a fair manner according to their contributions. Although in literature Shapley fairness and alternatives have been well explored to facilitate revenue allocation in data assemblage, computing exact Shapley value for many data owners and large assembled data sets through coalition remains challenging due to the combinatoric nature of Shapley value. In this paper, we explore the decomposability of utility in data assemblage by formulating the independent utility assumption. We argue that independent utility enjoys many applications. Moreover, we identify interesting properties of independent utility and develop fast computation techniques for exact Shapley value under independent utility. Our experimental results on a series of benchmark data sets show that our new approach not only guarantees the exactness of Shapley value, but also achieves faster computation by orders of magnitudes.
翻译:在许多应用中,一个组织可能希望从许多数据拥有者那里获取数据。数据市场允许数据所有者通过联合生成数据购买者所需要的数据组合。为了鼓励建立联盟以生成数据,关键是要根据数据所有者的贡献公平地分配收入。虽然在文献中已经很好地探索了Shapley公平性和替代办法,以便利在数据组合中分配收入,但许多数据所有者和通过联盟收集的大型数据集精确的沙pely值仍然具有挑战性,因为沙pely值的组合性质。在本文中,我们通过制定独立的效用假设来探索数据组合的用途的不兼容性。我们争辩说,独立的效用有许多应用。此外,我们查明独立效用的有趣特性,并在独立效用下为精确的形状价值开发快速计算技术。我们一系列基准数据集的实验结果表明,我们的新办法不仅保证了沙pely值的准确性,而且还以数量顺序更快地计算。