In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than only bandit feedback about the reward. Our main contribution is an algorithm that achieves regret bounds scaling only with the complexity of the distribution shifts and not that of the reward function. The algorithm only relies on smoothness of the shifts and does not assume convexity. Moreover, its final iterate is guaranteed to be near-optimal. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.
翻译:执行性预测中, 部署预测模型会触发数据分布的变化。 由于这些变化通常在时间之前未知, 学习者需要先部署一个模型, 以获得关于它所引发的分布的反馈。 我们研究如何找到接近最佳的模型, 同时又保持低度遗憾。 在表面上, 这个问题似乎相当于强盗问题。 但是, 它显示出一个根本上更丰富的反馈结构, 我们称之为表现性的反馈: 每次部署后, 学习者都从转移的分布中接收样本, 而不是仅仅从有关奖励的强盗反馈中获取样本。 我们的主要贡献是, 一种算法, 只有在分配变化的复杂性而不是奖励功能的复杂性下才能实现后悔的缩放。 算法只依靠移动的顺畅, 而不是假设混杂。 此外, 其最终的代号被保证是近于最优化的。 关键算法理念是仔细探索分配变化, 用于在未探索模型的风险上新的信任界限的构建。 更广义地说, 我们的工作确立了一种概念方法, 来利用土匪组织文学工具来进行最小化的反馈。