Due to the superior performance, large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society. However, we still lack effective tools to understand the potential bias embedded in the black-box models. Recent advances in prompt tuning show the possibility to explore the internal mechanism of the PLMs. In this work, we propose two token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment Shift Test (SST) which utilize the prompt as a probe to detect the latent bias in the PLMs. Our experiments on the collection of sentiment datasets show that both SAT and SST can identify sentiment bias in PLMs and SST is able to quantify the bias. The results also suggest that fine-tuning can possibly augment the existing bias in PLMs.
翻译:由于表现优异,人类社会的许多方面广泛采用大规模预先培训语言模式(PLMs),然而,我们仍缺乏有效工具来理解黑箱模式中潜在的偏见,近期在快速调试方面的进展表明探索PLMs内部机制的可能性。在这项工作中,我们建议进行两个象征性的情绪测试:感官协会测试和感官转变测试(SST),它们利用快速检测来探测PLMs中的潜在偏见。我们在收集情绪数据集方面的实验表明,SAT和SST都能识别PLMs和SST中的情绪偏见。结果还表明,微调可能会增加目前对PLMs的偏见。