Researchers may perform regressions using a sketch of data of size $m$ instead of the full sample of size $n$ for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. We show that estimates using data sketched by random projections will behave `as if' the errors were homoskedastic. Estimation by random sampling would not have this property. The result arises because the sketched estimates in the case of random projections can be expressed as degenerate $U$-statistics, and under certain conditions, these statistics are asymptotically normal with homoskedastic variance. We verify that the conditions hold not only in the case of least squares regression when the covariates are exogenous, but also in instrumental variables estimation when the covariates are endogenous. The result implies that inference, including first-stage F tests for instrument relevance, can be simpler than the full sample case if the sketching scheme is appropriately chosen.
翻译:由于各种原因,研究人员可能使用大小数据草图进行回归,而不是完整的大小抽样,因为有各种原因。本文考虑了回归错误没有不变差异的情况,测试统计通常需要恒定的标准差,以提供准确的推断。我们表明,使用随机预测所绘制的数据进行估算将“如”差错是同源体。随机抽样的估算不会具有这一属性。结果产生的原因是,随机预测情况下的草图估计数可以表现为贬值的美元-统计学,在某些条件下,这些统计数据与同源差不完全正常。我们核实,当共变量为外源时,这些条件不仅存在于最小的方形回归中,而且当共变量为内生时,也存在于工具变量估计中。结果表明,推断,包括第一阶段F仪器相关性测试,如果适当选择了草图计划,则可能比全部样本简单。