通过人类认知分界体获取大语言模型失败 (Capturing Failures of Large Language Models via Human Cognitive Biases)

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases -- systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.

翻译：大型语言模型产生复杂、开放的产出: 与其输出一个分类标签, 编写摘要, 生成对话, 或生成工作代码。为了评估这些不限名额的一代系统的可靠性, 我们的目标是确定错误行为的质量类别, 而不是识别个人错误。为了虚度和测试这类质量错误, 我们从人类认知偏差中汲取灵感 -- -- 系统偏离理性判断的系统模式。具体地说, 我们使用认知偏差作为动机, (一) 生成模型可能存在的问题的假设, 以及 (二) 开发引起这些问题的实验。我们用代码生成作为案例研究, 我们发现 OpenAI 的代码错误可以预测地基于输入提示的设置, 调整输出到锁定点, 偏向于模拟常见培训实例的产出。我们然后使用我们的框架来产生高影响错误, 比如错误删除文件。我们的结果表明, 认知科学实验方法可以帮助描述机器学习系统的行为。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日