This papers studies how competition affects machine learning (ML) predictors. As ML becomes more ubiquitous, it is often deployed by companies to compete over customers. For example, digital platforms like Yelp use ML to predict user preference and make recommendations. A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e.g. in the form of a Yelp review). Thus, competing predictors cause feedback loops whereby a predictor's performance impacts what training data it receives and biases its predictions over time. We introduce a flexible model of competing ML predictors that enables both rapid experimentation and theoretical tractability. We show with empirical and mathematical analysis that competition causes predictors to specialize for specific sub-populations at the cost of worse performance over the general population. We further analyze the impact of predictor specialization on the overall prediction quality experienced by users. We show that having too few or too many competing predictors in a market can hurt the overall prediction quality. Our theory is complemented by experiments on several real datasets using popular learning algorithms, such as neural networks and nearest neighbor methods.
翻译:本文研究竞争如何影响机器学习(ML)预测。 随着 ML越来越普遍, 它常常被公司用来竞争客户。 例如, Yelp 等数字平台使用 ML 来预测用户的偏好并提出建议。 用户更经常询问这一服务, 因为它更准确地预测用户的偏好, 也更有可能获得更多的用户数据( 例如以Yelp 审评的形式)。 因此, 相竞争的预测者会产生反馈循环, 预测者的业绩影响它收到的培训数据, 并随着时间推移其预测。 我们引入了相互竞争的 ML 预测器的灵活模型, 既能快速实验,也能在理论上牵引力。 我们通过实验和数学分析显示, 竞争预测者导致以比一般人口业绩差的代价对具体的亚人口进行专门化。 我们进一步分析预测者对用户总体预测质量的影响。 我们发现, 市场中竞争的预测者太少或太多, 可能会损害总体预测质量。 我们的理论通过使用大众学习算法, 例如神经网络和近邻方法对几个真实数据集进行实验。