The release of ChatGPT has uncovered a range of possibilities whereby large language models (LLMs) can substitute human intelligence. In this paper, we seek to understand whether ChatGPT has the potential to reproduce human-generated label annotations in social computing tasks. Such an achievement could significantly reduce the cost and complexity of social computing research. As such, we use ChatGPT to re-label five seminal datasets covering stance detection (2x), sentiment analysis, hate speech, and bot detection. Our results highlight that ChatGPT does have the potential to handle these data annotation tasks, although a number of challenges remain. ChatGPT obtains an average precision 0.609. Performance is highest for the sentiment analysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, we show that performance varies substantially across individual labels. We believe this work can open up new lines of analysis and act as a basis for future research into the exploitation of ChatGPT for human annotation tasks.
翻译:ChatGPT的发布揭示了大语言模型(LLMs)可以替代人类智能的各种可能性。在本文中,我们试图了解ChatGPT是否有潜力在社交计算任务中复制人工生成的标签注释。这样的成就可以极大地降低社交计算研究的成本和复杂性。因此,我们使用ChatGPT重新标记了五个语料库,涵盖态度检测(2x)、情感分析、仇恨言论和机器人检测。我们的结果表明,ChatGPT确实有处理这些数据注释任务的潜力,尽管仍存在许多挑战。 ChatGPT的平均精度为0.609。情感分析语料库的性能最好,ChatGPT可正确注释64.9%的推文。然而,我们表明性能在各个标签之间存在巨大的差异。我们相信这项工作可以开展新的分析线,并成为ChatGPT用于人工注释任务的未来研究的基础。