A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we present a systematic comparison of systems using a variety of text sources and show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g., are more sensitive to specific SDGs relative to others), and are susceptible to the type and amount of text analyzed. We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems. We conclude that researchers and policymakers should care about the choice of labeling system and that ensemble methods should be favored when drawing conclusions about the absolute and relative prevalence of work on the SDGs based on automated methods.
翻译:提议了一些基于文本的标签制度,以帮助监测联合国可持续发展目标(SDGs)的工作。在这里,我们系统地比较使用各种文本来源的系统,并表明系统在特性(即真实正率)和敏感性(即真实负率)和敏感性(即真实负率)方面差异很大,有系统性偏差(例如,对特定可持续发展目标相对其他目标更加敏感),并易受所分析文本的类型和数量的影响。然后,我们表明,集合标签制度组合的集合型模型可以缓解其中一些限制,超过所有现有系统标签的性能。我们的结论是,研究人员和决策者应该注意标签制度的选择,在根据自动化方法就可持续发展目标绝对和相对普遍开展工作得出结论时,应该倾向于采用混合方法。