Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.
翻译:定性分析文本内容是通过将标签分配给数据揭示丰富有价值的信息的过程。然而,这个过程通常非常耗时,特别是在处理大型数据集时。虽然最近的基于人工智能的工具显示出其效用,但研究人员可能没有现成的人工智能资源和专业知识,更不用说受到这些特定任务模型的有限普适性挑战。在这项研究中,我们探索了使用大型语言模型(LLMs)来支持演绎编码的应用,演绎编码是定性分析的一个主要类别,在其中研究人员使用预先确定的代码簿将数据标记为固定的代码集合。与训练特定任务模型不同(这通常需要通过提示学习进行微调),可以直接使用预训练的LLM来处理各种任务。通过好奇心驱动的问题编码任务作为案例研究,我们发现,通过将GPT-3与由专家起草的代码簿相结合,我们提出的方法与专家编码结果具有合理到实质性的一致性。我们阐述了利用LLMs支持定性编码及其它领域的挑战和机遇。