Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we demonstrate that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection. Specifically, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers for four out of five tasks, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003 -- about twenty times cheaper than MTurk. These results show the potential of large language models to drastically increase the efficiency of text classification.
翻译:许多NLP应用程序需要人工数据注释来进行各种任务,特别是为了训练分类器或评估无监督模型的性能。根据任务的大小和复杂程度,任务可以由MTurk等平台上的众包工作者以及研究助手等受过训练的注释员进行。我们使用2382个推文的样本,证明了ChatGPT在多个注释任务中优于众包工作者,包括相关性、立场、主题和框架检测。具体而言,ChatGPT的零-shot精度在五个任务中有四个优于众包工作者,而ChatGPT的区间协议达成度在所有任务中优于众包工作者和受过训练的注释员。此外,ChatGPT每个注释的成本不到0.003美元,大约是MTurk的二十倍。这些结果展示了大型语言模型极大地提高文本分类效率的潜力。