We investigate the impact of political ideology biases in training data. Through a set of comparison studies, we examine the propagation of biases in several widely-used NLP models and its effect on the overall retrieval accuracy. Our work highlights the susceptibility of large, complex models to propagating the biases from human-selected input, which may lead to a deterioration of retrieval accuracy, and the importance of controlling for these biases. Finally, as a way to mitigate the bias, we propose to learn a text representation that is invariant to political ideology while still judging topic relevance.
翻译:我们调查了政治意识形态偏见在培训数据中的影响。我们通过一套比较研究,研究了几个广泛使用的NLP模型中偏见的传播及其对总体检索准确性的影响。我们的工作强调大型复杂的模型易于传播来自人类选择的投入的偏见,这可能导致检索准确性下降,以及控制这些偏见的重要性。最后,作为减少偏见的一种方式,我们提议在判断主题相关性的同时,学习一种对政治意识形态不起作用的文字表述。