External validity is often questionable in empirical research, especially in randomized experiments due to the trade-off between internal validity and external validity. To quantify the robustness of external validity, one must first conceptualize the gap between a sample that is fully representative of the target population (i.e., the ideal sample) and the observed sample. Drawing on Frank & Min (2007) and Frank et al. (2013), I define such gap as the unobserved sample and intend to quantify its relationship with the null hypothesis statistical testing (NHST) in this study. The probability of invalidating a causal inference due to limited external validity, i.e., the PEV, is the probability of failing to reject the null hypothesis based on the ideal sample provided the null hypothesis has been rejected based on the observed sample. This study illustrates the guideline and the procedure of evaluating external validity with the PEV through an empirical example (i.e., Borman et al. (2008)). Specifically, one would be able to locate the threshold of the unobserved sample statistic that would make the PEV higher than a desired value and use this information to characterize the unobserved sample that would render external validity of the research in question less robust. The PEV is shown to be linked to statistical power when the NHST is thought to be based on the ideal sample.
翻译:在实证研究中,特别是在由于内部有效性与外部有效性之间的权衡而随机进行的实验中,外部有效性往往有疑问。为了量化外部有效性的稳健性,首先必须设想完全代表目标人群的样本(即理想样本)与观察到的样本之间的差距。根据Frank & Min(2007年)和Frank等人(2013年),我界定了未观察到的样本等差距,并打算量化其与本研究报告中无效假设统计测试(NHST)的关系。由于外部有效性有限,即PEV,导致因因果关系推论无效的可能性是,如果根据观察到的样本否定了无效假设,那么根据理想样本拒绝无效假设的可能性。这项研究通过一个经验性实例(即Borman等人(2008年))说明评估PEV外部有效性的准则和程序。具体地说,如果将PEV比预期值高,那么使用这一信息来描述未观察到的样本,如果将使PEV与理想的样本的外部有效性联系起来,那么将证明PEV与理想的样本的外部有效性与理想性关系不大,那么,就能够确定。