The design of data markets has gained importance as firms increasingly use predictions from machine learning models to streamline operations, yet need to externally acquire training data to fit such models. One aspect that has received limited consideration is the externality a firm faces when data is allocated to competing firms. Such externalities couple firms' optimal allocations, despite the inherent free replicability of data. In this paper, we demonstrate that the presence of externalities increases the optimal revenue of a monopolistic data seller by letting firms pay to prevent allocations to other competing firms. This is shown by first reducing the combinatorial problem of allocating and pricing multiple datasets to the auction of a single digital good. We achieve this by modeling utility for data solely through the increase in prediction accuracy it provides. Then, we find the welfare and revenue maximizing mechanisms, highlighting how the form of firms' private information - whether they know the externalities they exert on others or vice-versa - affects their overall structures. In all cases, the optimal allocation rule is a single threshold (one per firm), where either all data is allocated or none is.
翻译:随着公司越来越多地利用机器学习模型的预测来精简业务,数据市场的设计变得日益重要,但需要从外部获取培训数据以适应这些模型。一个得到有限考虑的方面是,在数据分配给竞争企业时,公司面对的外向性,这种外向性对等公司的最佳分配,尽管数据具有内在的可自由复制性。在本文中,我们表明,外部效应的存在通过让公司支付防止向其他竞争企业分配资金而增加了垄断数据销售商的最佳收入。这表现在,首先通过仅仅通过提高预测准确性来模拟数据对多个数据集的分配和定价的组合问题。然后,我们发现福利和收入最大化机制,强调公司私人信息的形式――无论它们知道对他人的外向性还是反向――如何影响其总体结构。在所有情况下,最佳分配规则都是单一的门槛(每家公司一个),所有数据都分配或根本没有。