Conversational data is essential in psychology because it can help researchers understand individuals cognitive processes, emotions, and behaviors. Utterance labelling is a common strategy for analyzing this type of data. The development of NLP algorithms allows researchers to automate this task. However, psychological conversational data present some challenges to NLP researchers, including multilabel classification, a large number of classes, and limited available data. This study explored how automated labels generated by NLP methods are comparable to human labels in the context of conversations on adulthood transition. We proposed strategies to handle three common challenges raised in psychological studies. Our findings showed that the deep learning method with domain adaptation (RoBERTa-CON) outperformed all other machine learning methods; and the hierarchical labelling system that we proposed was shown to help researchers strategically analyze conversational data. Our Python code and NLP model are available at https://github.com/mlaricheva/automated_labeling.
翻译:联系数据在心理学中至关重要,因为它可以帮助研究人员理解个人的认知过程、情感和行为。偏差标签是分析这类数据的共同战略。开发NLP算法可以使研究人员实现这项任务的自动化。然而,心理对话数据给NLP研究人员带来了一些挑战,包括多标签分类、多类和有限的可用数据。这项研究探讨了NLP方法产生的自动标签如何与成人转型对话中的人类标签相提并论。我们提出了应对心理研究提出的三个共同挑战的战略。我们的调查结果显示,域适应的深学习方法(ROBERTA-CON)优于所有其他机器学习方法;以及我们提议的等级标签系统,显示它有助于研究人员战略性地分析谈话数据。我们的Python代码和NLP模型可在https://github.com/mlaricheva/a自动化标签上查阅。