We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features. Our code is available at https://github.com/MadryLab/EditingClassifiers .
翻译:我们提出了一个通过直接改写预测规则来改变分类者行为的方法。 我们的方法几乎不需要额外的数据收集,可以应用于各种环境,包括根据新的环境调整模型,并修改模型,使之忽略虚假特征。我们的代码可在https://github.com/MadryLab/EditingClassations上查阅。