Generating a chain of thought (CoT) can increase large language model (LLM) performance on a wide range of tasks. Zero-shot CoT evaluations, however, have been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In this paper, we perform a controlled evaluation of zero-shot CoT across two sensitive domains: harmful questions and stereotype benchmarks. We find that using zero-shot CoT reasoning in a prompt can significantly increase a model's likelihood to produce undesirable output. Without future advances in alignment or explicit mitigation instructions, zero-shot CoT should be avoided on tasks where models can make inferences about marginalized groups or harmful topics.
翻译:产生一连串的思维(CoT)可以提高大量语言模式在广泛任务方面的绩效。但是,零点点数的COT评价主要针对逻辑任务(例如算术、常识QA)进行。在本文件中,我们对零点数的COT在两个敏感领域进行有控制的评价:有害问题和陈规定型基准。我们发现,迅速采用零点数COT推理可以大大增加模型产生不良产出的可能性。如果今后在统一或明确减缓指示方面没有进展,在模型可以推断边缘化群体或有害主题的任务上,应该避免零点数的COT。