Guidance on how to validate computational text-based measures of social science constructs is fragmented. While scholars generally acknowledge the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces ValiTex, a new validation framework designed to assist scholars in validly measuring social science constructs based on textual data. ValiTex prescribes researchers to demonstrate three types of validity evidence: substantive evidence (outlining the theoretical underpinning of the measure), structural evidence (examining the properties of the text model and its output), and external evidence (testing for how the measure relates to independent information). In addition to the framework, ValiTex offers valuable practical guidance through a checklist that is adaptable for different use cases. The checklist clearly defines and outlines specific validation steps while also offering a knowledgeable evaluation of the importance of each validation step to establish validity. We demonstrate the utility of the framework by applying it to a use case of detecting sexism from social media data.
翻译:暂无翻译