Open-ended questions in surveys are valuable because they do not constrain the respondent's answer, thereby avoiding biases. However, answers to open-ended questions are text data which are harder to analyze. Traditionally, answers were manually classified as specified in the coding manual. Most of the effort to automate coding has gone into the easier problem of single label prediction, where answers are classified into a single code. However, open-ends that require multi-label classification, i.e., that are assigned multiple codes, occur frequently. This paper focuses on multi-label classification of text answers to open-ended survey questions in social science surveys. We evaluate the performance of the transformer-based architecture BERT for the German language in comparison to traditional multi-label algorithms (Binary Relevance, Label Powerset, ECC) in a German social science survey, the GLES Panel (N=17,584, 55 labels). We find that classification with BERT (forcing at least one label) has the smallest 0/1 loss (13.1%) among methods considered (18.9%-21.6%). As expected, it is much easier to correctly predict answer texts that correspond to a single label (7.1% loss) than those that correspond to multiple labels ($\sim$50% loss). Because BERT predicts zero labels for only 1.5% of the answers, forcing at least one label, while recommended, ultimately does not lower the 0/1 loss by much. Our work has important implications for social scientists: 1) We have shown multi-label classification with BERT works in the German language for open-ends. 2) For mildly multi-label classification tasks, the loss now appears small enough to allow for fully automatic classification (as compared to semi-automatic approaches). 3) Multi-label classification with BERT requires only a single model. The leading competitor, ECC, iterates through individual single label predictions.
翻译:摘要:调查中的开放式问题具有很高的价值,因为它们不限制受访者的答案,从而避免了偏见。然而,开放式问题的答案是文本数据,分析起来更加困难。传统上,答案会根据编码手册进行手动分类。自动化编码的大部分努力都集中在更简单的单标签预测问题上,其中答案被分类为单个代码。然而,需要多标签分类的开放式问题(即分配多个代码的问题)是经常发生的。本文重点研究了社会科学调查中文本答案的多标签分类问题。我们评估了德语环境下基于Transformer的Bert体系结构相对于传统的多标签算法(二元关联,标签集,错误修正编码器)在德语社会科学调查中的性能,即GLES面板(N = 17,584,55个标签)。我们发现,在考虑的方法中,Bert分类(至少强制一个标签)的0/1损失最小(13.1%),而其他方法为18.9%-21.6%。预计,正确预测与单个标签对应的答案文本(7.1%损失)要比与多个标签对应的答案文本(约50%的损失)更容易。因为Bert对只有1.5%的答案预测了零个标签,所以强制至少一个标签,虽然是推荐的,但最终并没有明显降低0/1损失。我们的工作对社会科学家具有重要意义:1)我们已经证明德语环境下多标签Bert开放式答案分类是可行的。 2)对于稍有多标签的分类任务,损失似乎足够小,可以允许进行完全自动分类(相对于半自动方法)。 3)多标签Bert分类只需要一个模型。而主要的竞争对手ECC会迭代进行单个标签预测。