Recent studies have exploited advanced generative language models to generate Natural Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain of Explanation Prompting method, inspired by the chain of thoughts study \cite{wei2022chain}, to generate high-quality NLE for implicit hate speech. We build a benchmark based on the selected mainstream Pre-trained Language Models (PLMs), including GPT-2, GPT-Neo, OPT, T5, and BART, with various evaluation metrics from lexical, semantic, and faithful aspects. To further evaluate the quality of the generated NLE from human perceptions, we hire human annotators to score the informativeness and clarity of the generated NLE. Then, we inspect which automatic evaluation metric could be best correlated with the human-annotated informativeness and clarity metric scores.
翻译:最近的研究利用先进的基因化语言模型来产生自然语言解释(NLE),以说明为什么某些文本可能令人憎恶。我们建议由思维链研究\cite{wei2022链}启发的“解释提示链”方法,为隐含的仇恨言论产生高质量的NLE。我们根据选定的主流培训前语言模型(包括GPT-2、GPT-Neo、GOPT-Neo、Obel、T5和BART)建立一个基准,并采用来自词汇、语义和忠诚方面的各种评价指标。为了进一步评估从人类感知中生成的NLE的质量,我们聘请人类警告员对生成NLE的信息性和清晰度进行评分。然后,我们检查哪些自动评价指标与人类附加说明的信息性和清晰度分数最相关。