We present a robust methodology for evaluating biases in natural language generation(NLG) systems. Previous works use fixed hand-crafted prefix templates with mentions of various demographic groups to prompt models to generate continuations for bias analysis. These fixed prefix templates could themselves be specific in terms of styles or linguistic structures, which may lead to unreliable fairness conclusions that are not representative of the general trends from tone varying prompts. To study this problem, we paraphrase the prompts with different syntactic structures and use these to evaluate demographic bias in NLG systems. Our results suggest similar overall bias trends but some syntactic structures lead to contradictory conclusions compared to past works. We show that our methodology is more robust and that some syntactic structures prompt more toxic content while others could prompt less biased generation. This suggests the importance of not relying on a fixed syntactic structure and using tone-invariant prompts. Introducing syntactically-diverse prompts can achieve more robust NLG (bias) evaluation.
翻译:我们提出了一种评估自然语言生成系统偏见的可靠方法。以前的工作使用固定手工制作的前缀模板,其中提到不同的人口群体,以促使模型产生偏见分析的延续性。这些固定的前缀模板本身在风格或语言结构方面可能是具体的,可能导致不可靠的公正结论,不能代表来自不同语调的流行趋势。为了研究这一问题,我们用不同的综合结构来解释这些提示,并用它们来评价新语言系统的人口偏见。我们的结果表明,总体偏见趋势相似,但有些合成结构导致与过去的工作相反的结论。我们表明,我们的方法更加健全,有些合成结构催生更多的有毒内容,而另一些则可能催化较少偏见的一代。这表明,重要的是不依赖固定的合成结构,而使用语调变异的提示。采用合成不同提示,可以实现更强有力的NLG(bias)评估。