In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than only bandit feedback about the reward. Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. The construction only relies on smoothness of the shifts and does not assume convexity. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.
翻译:执行性预测中, 部署预测模型会触发数据分布的转变。 由于这些转变通常在时间之前未知, 学习者需要先部署一个模型, 才能获得关于它所引发的分布的反馈。 我们研究的是在保持低遗憾的同时找到接近最佳的性能模型的问题。 从表面上看, 这个问题可能相当于强盗问题。 但是, 它显示出一个根本上更丰富的反馈结构, 我们称之为表现性的反馈: 每次部署后, 学习者从转移的分布中接收样本, 而不是仅仅从关于奖励的强盗反馈中接收样本。 我们的主要贡献是, 遗憾的界限是, 仅仅与分配变化的复杂性相比, 而不是奖励功能的复杂程度。 关键的算法理念是仔细探索分布变化, 以未探索模型的风险为新的信任界限。 构建时, 仅仅依赖于轮班的平滑, 而不是假设共性。 更广义地说, 我们的工作为利用强盗文学工具, 以便以表现性反馈为最小化目的, 确立了一个概念方法。