Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited with promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD -- perturbations of core features -- may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and non-sexist usage of identity and gendered terms. In these hard cases, models trained on CAD, especially construct-driven CAD, show higher false-positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD -- construct-driven and construct-agnostic -- reduces such unintended bias.
翻译:反实际增强数据(CAD)旨在改进外向外的一般性,这是模型稳健性的一个指标。改进的功劳主要归功于促进建筑中的核心特征,以弥补偶然与之相关的虚假文物。然而,过度依赖核心特征可能导致无意的模型偏差。特别是,建筑驱动的CAD -- -- 核心特征的扰动 -- -- 可能导致模型忽视使用核心特征的背景。在这里,我们测试了性别主义和仇恨言论检测模式,其依据是具有挑战性的数据:不仇恨和不流行地使用身份和性别术语。在这些困难案例中,关于CAD,特别是建筑驱动的CAD的模型显示的虚假阳性率高于原始、无扰动的数据培训模式。使用多种CAD -- -- 建筑驱动和构建-认知 -- 减少这种意外偏差。