High-quality text data has become an important data source for social scientists. We have witnessed the success of pretrained deep neural network models, such as BERT and RoBERTa, in recent social science research. In this paper, we propose a compact pretrained deep neural network, Transformer Encoder for Social Science (TESS), explicitly designed to tackle text processing tasks in social science research. Using two validation tests, we demonstrate that TESS outperforms BERT and RoBERTa by 16.7% on average when the number of training samples is limited (<1,000 training instances). The results display the superiority of TESS over BERT and RoBERTa on social science text processing tasks. Lastly, we discuss the limitation of our model and present advice for future researchers.
翻译:高质量的文本数据已成为社会科学家的重要数据来源。我们在最近的社会科学研究中目睹了诸如BERT和ROBERTA等经过预先训练的深神经网络模型的成功。在本文件中,我们提议建立一个经过训练的精密神经网络,即社会科学变异器编码器(TESS),其设计明确是为了处理社会科学研究的文本处理任务。我们通过两个验证测试,证明在培训样本数量有限时,TESS比BERT和ROBERTA平均高出16.7%( < 1 000个培训案例),结果显示TESS在社会科学文本处理任务上优于BERT和ROBERTA。最后,我们讨论了我们模型的局限性,并向未来的研究人员提出咨询意见。