The volume of scientific publications in organizational research becomes exceedingly overwhelming for human researchers who seek to timely extract and review knowledge. This paper introduces natural language processing (NLP) models to accelerate the discovery, extraction, and organization of theoretical developments (i.e., hypotheses) from social science publications. We illustrate and evaluate NLP models in the context of a systematic review of stakeholder value constructs and hypotheses. Specifically, we develop NLP models to automatically 1) detect sentences in scholarly documents as hypotheses or not (Hypothesis Detection), 2) deconstruct the hypotheses into nodes (constructs) and links (causal/associative relationships) (Relationship Deconstruction ), and 3) classify the features of links in terms causality (versus association) and direction (positive, negative, versus nonlinear) (Feature Classification). Our models have reported high performance metrics for all three tasks. While our models are built in Python, we have made the pre-trained models fully accessible for non-programmers. We have provided instructions on installing and using our pre-trained models via an R Shiny app graphic user interface (GUI). Finally, we suggest the next paths to extend our methodology for computer-assisted knowledge synthesis.
翻译:组织研究的科学出版物数量对于寻求及时提取和审查知识的人类研究人员来说,其数量极为庞大。本文介绍了自然语言处理模式,以加速社会科学出版物的理论发展(即假设)的发现、提取和组织。我们在系统审查利益攸关方价值结构和假设(积极、消极和非线性分类)的背景下,说明和评价了国家语言处理模式。具体地说,我们开发了国家语言方案模型,以自动1)将学术文件中的句号作为假设或不是假设进行检测(假想检测),2 将假设拆解为节点(构件)和链接(视象/联系关系)(建筑关系)和3)的自然语言处理模式,以加速发现、提取和组织社会科学出版物的理论发展(假设)。我们从因果关系(反向联系)和方向(积极、消极和非线性分类)的链接特征。我们的模式报告了所有三项任务的高性能衡量标准。虽然我们的模型建在Python,但我们已经使非程序员完全可以使用经过培训的模型。我们提供了安装和使用我们最新的用户界面模型的指示,我们通过最后的系统化模型来建议了我们的系统化模型。