Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors' annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.
翻译:在科学领域,特别是在文本-数据领域,方法和应用是密不可分的,在科学领域,特别是在文本-数据领域。在本文件中,我们研究了一个这样的文本-数据应用,这是一个既定的经济指数,用来衡量关键词在新闻中出现的经济政策不确定性。该指数显示与公司投资、就业和超额市场回报有关,对私营部门和学术界都产生了实质性影响。然而,在我们重新审视和扩展原始作者的说明和文本测量时,我们发现有有趣的文本-数据-方法研究问题:(1) 注释性分歧反映了语言上的模糊性吗? (2) 替代文本测量是否相互相关,是否与外部预测有效性的措施相关?我们发现,这一应用:(1) 经济政策不确定性的一些注释性分歧可归因于语言上的模糊性,(2) 将关键词-匹配的测量方法转换为受监督的机器学习分类方法,结果不那么相关性就意味着指数的有效性。