Many natural language processing (NLP) tasks rely on labeled data to train machine learning models to achieve high performance. However, data annotation can be a time-consuming and expensive process, especially when the task involves a large amount of data or requires specialized domains. Recently, GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. In this paper, we first claim that large language models (LLMs), such as GPT-3.5, can serve as an excellent crowdsourced annotator by providing them with sufficient guidance and demonstrated examples. To make LLMs to be better annotators, we propose a two-step approach, 'explain-then-annotate'. To be more precise, we begin by creating prompts for every demonstrated example, which we subsequently utilize to prompt a LLM to provide an explanation for why the specific ground truth answer/label was chosen for that particular example. Following this, we construct the few-shot chain-of-thought prompt with the self-generated explanation and employ it to annotate the unlabeled data. We conduct experiments on three tasks, including user input and keyword relevance assessment, BoolQ and WiC. The annotation results from GPT-3.5 surpasses those from crowdsourced annotation for user input and keyword relevance assessment. Additionally, for the other two tasks, GPT-3.5 achieves results that are comparable to those obtained through crowdsourced annotation.
翻译:许多自然语言处理(NLP)任务都依赖于标记数据来训练机器学习模型以实现高性能。然而,数据注释可能是一项耗时且昂贵的过程,特别是当任务涉及大量数据或需要专业领域时。最近,GPT-3.5 系列模型展示了在各种 NLP 任务中出色的少样本和零样本能力。在本文中,我们首先声称大型语言模型(LLM),例如 GPT-3.5,可以通过为其提供足够的指导和演示示例来担任出色的众包标注工具。为使 LLM 成为更好的标注工具,我们提出了一个两步方法,“先解释再标注”。更精确地说,我们首先为每个演示示例创建提示,随后利用它们提示 LLM 提供有关为什么选择特定的基本真值答案/标签的解释。随后,我们使用自动生成的解释构建少样本思路链提示,并将其用于标记未标注的数据。我们在三个任务上进行了实验,包括用户输入和关键字相关性评估、BoolQ 和 WiC。对于用户输入和关键字相关性评估,GPT-3.5 的标注结果超过了众包标注;此外,对于其他两个任务,GPT-3.5 的结果与通过众包标注获得的结果基本相当。