When a model informs decisions about people, distribution shifts can create undue disparities. However, it is hard for external entities to check for distribution shift, as the model and its training set are often proprietary. In this paper, we introduce and study a black-box auditing method to detect cases of distribution shift that lead to a performance disparity of the model across demographic groups. By extending techniques used in membership and property inference attacks -- which are designed to expose private information from learned models -- we demonstrate that an external auditor can gain the information needed to identify these distribution shifts solely by querying the model. Our experimental results on real-world datasets show that this approach is effective, achieving 80--100% AUC-ROC in detecting shifts involving the underrepresentation of a demographic group in the training set. Researchers and investigative journalists can use our tools to perform non-collaborative audits of proprietary models and expose cases of underrepresentation in the training datasets.
翻译:当模型向人们通报有关人口的决定时,分配变化可能造成不适当的差异。然而,外部实体很难检查分配变化,因为模型及其培训组往往是专有的。在本文中,我们引入并研究一个黑箱审计方法,以发现分配变化导致模式在人口群体间产生性能差异的情况。通过推广成员和财产推断攻击中所使用的技术 -- -- 其目的是暴露从所学模型获得的私人信息 -- -- 我们证明,外部审计员只能通过查询模型才能获得识别这些分配变化所需的信息。我们在现实世界数据集的实验结果表明,这种方法是有效的,在发现涉及培训组中人口群体代表性不足的转移方面实现了80-100%的ACUC-ROC。研究人员和调查记者可以使用我们的工具,对专利模型进行非协作性审计,并揭露培训数据集中代表性不足的情况。