As data-driven methods are deployed in real-world settings, the processes that generate the observed data will often react to the decisions of the learner. For example, a data source may have some incentive for the algorithm to provide a particular label (e.g. approve a bank loan), and manipulate their features accordingly. Work in strategic classification and decision-dependent distributions seeks to characterize the closed-loop behavior of deploying learning algorithms by explicitly considering the effect of the classifier on the underlying data distribution. More recently, works in performative prediction seek to classify the closed-loop behavior by considering general properties of the mapping from classifier to data distribution, rather than an explicit form. Building on this notion, we analyze repeated risk minimization as the perturbed trajectories of the gradient flows of performative risk minimization. We consider the case where there may be multiple local minimizers of performative risk, motivated by situations where the initial conditions may have significant impact on the long-term behavior of the system. We provide sufficient conditions to characterize the region of attraction for the various equilibria in this settings. Additionally, we introduce the notion of performative alignment, which provides a geometric condition on the convergence of repeated risk minimization to performative risk minimizers.
翻译:随着数据驱动方法在现实世界环境中的部署,生成观察到的数据的过程往往会对学习者的决定作出反应。例如,数据源可能具有一定的动力,使算法能够提供特定标签(例如批准银行贷款)并相应调整其特征。战略分类和根据决策分配工作力求明确考虑分类员对基本数据分布的影响,从而说明部署学习算法的闭路行为特征。最近,实绩预测工作力求通过考虑绘图从分类到数据分布的一般特性,而不是明确的形式,对闭路行为进行分类。基于这个概念,我们分析重复的风险最小化作为递减风险最小化的梯度流动的周绕轨迹。我们考虑的情况是,最初条件可能对系统的长期行为产生重大影响,因此可能存在多种局部最小化风险。我们提供了足够的条件,说明在这一环境中各种均衡的吸引区域。此外,我们引入了履行风险最小化的风险最小化概念,为重复的几何性风险趋同状态提供了最小化的风险。