The design of data markets has gained importance as firms increasingly use machine learning models fueled by externally acquired training data. A key consideration is the externalities firms face when data, though inherently freely replicable, is allocated to competing firms. In this setting, we demonstrate that a data seller's optimal revenue increases as firms can pay to prevent allocations to others. To do so, we first reduce the combinatorial problem of allocating and pricing multiple datasets to the auction of a single digital good by modeling utility for data through the increase in prediction accuracy it provides. We then derive welfare and revenue maximizing mechanisms, highlighting how the form of firms' private information - whether the externalities one exerts on others is known, or vice-versa - affects the resulting structures. In all cases, under appropriate assumptions, the optimal allocation rule is a single threshold per firm, where either all data is allocated or none is.
翻译:暂无翻译