As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience -- i.e. the accuracy of the predictor selected by each user -- can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.
翻译:由于机器学习(ML)是由许多相互竞争的服务提供商部署的,因此,基本的 ML 预测器也相互竞争,因此了解这种竞争的影响和偏见也越来越重要。在本文中,我们研究当相互竞争的预测器能够获得额外的标签数据以提高其预测质量时会发生什么情况。我们引入了一个新的环境,允许ML 预测器使用积极的学习算法在其预算中购买贴标签的数据,同时相互竞争以吸引用户。我们的环境模型是相互竞争的系统中获取数据的一个重要方面,而这种系统以前没有很好地研究过。我们发现,当预测器能够购买额外的标签数据时,ML 预测器的总体性能会得到改善。然而,令人惊讶的是,用户所经历的质量,即每个用户所选择的预测器的准确性,即使个别预测器变得更好,也会下降。我们表明,这种现象自然产生出于一种交易,即竞争推动每个预测器在特定人群中进行专门化,而数据购买的效果是使预测器更加一致。我们支持我们的实验和理论结果。