This paper introduces supervised machine learning to the literature measuring corporate culture from text documents. We compile a unique data set of employee reviews that were labeled by human evaluators with respect to the information the reviews reveal about the firms' corporate culture. Using this data set, we fine-tune state-of-the-art transformer-based language models to perform the same classification task. In out-of-sample predictions, our language models classify 16 to 28 percent points more of employee reviews in line with human evaluators than traditional approaches of text classification.
翻译:本文介绍从文本文件中测量公司文化的文献的受监督的机器学习。我们汇编了一套独特的雇员审查数据,由人类评价员在审查所披露的有关公司文化的信息上贴上标签。我们利用这套数据,对基于最先进的变压器的语言模型进行微调,以完成同样的分类任务。在抽样预测中,我们的语言模型按照人类评价员的分类,比传统的文本分类方法,将雇员审查的16%至28%归为更多的。