Large language models are increasingly capable of generating fluent-appearing text with relatively little task-specific supervision. But can these models accurately explain classification decisions? We consider the task of generating free-text explanations using a small number of human-written examples (i.e., in a few-shot manner). We find that (1) authoring higher-quality examples for prompting results in higher quality generations; and (2) surprisingly, in a head-to-head comparison, crowdworkers often prefer explanations generated by GPT-3 to crowdsourced human-written explanations contained within existing datasets. Crowdworker ratings also show, however, that while models produce factual, grammatical, and sufficient explanations, they have room to improve, e.g., along axes such as providing novel information and supporting the label. We create a pipeline that combines GPT-3 with a supervised filter that incorporates humans-in-the-loop via binary acceptability judgments. Despite significant subjectivity intrinsic to judging acceptability, our approach is able to consistently filter GPT-3 generated explanations deemed acceptable by humans.
翻译:大型语言模型越来越能够产生流利的文本,而任务范围相对较少。但这些模型能够准确解释分类决定吗?我们考虑利用少量人文写作的例子(即以片面方式)来产生自由文本解释的任务。我们发现:(1) 以高质量的代际关系产生更高质量的促进成果的例子;(2) 令人惊讶的是,通过头头对头的比较,众工往往倾向于GPT-3提出的解释,而倾向于现有数据集中包含的众源人文解释。 然而,众工评级还表明,虽然模型可以产生事实、语法和充分的解释,但它们有改进的空间,例如,沿轴线,例如提供新信息和支持标签。我们创建了一条管道,将GPT-3与通过二进式可接受性判断将人连接在一起的受监督过滤器结合起来。尽管我们的方法具有重要的主观性,可以判断可接受性,但我们能够一贯地过滤GPT-3生成为人类所接受的解释。