Researchers may use a sketch of data of size $m$ instead of the full sample of size $n$ sometimes to relieve computation burden, and other times to maintain data privacy. This paper considers the case when full sample estimation would have required the Eicker-Huber-White robust standard errors to account for heteroskedasticity. We show that random projections have a smoothing effect on the sketched data, with the consequence that the least squares estimates using such sketched data behave 'as if' the errors were homoskedastic. This result is obtained by expressing the difference between the moments computed from the full sample and the sketched data as a degenerate $U$-statistic which is asymptotically normal with a homoskedastic variance when the conditions in Hall (1984) are satisfied. This result also holds for two-stage least squares for which algorithmic and statistical properties are analyzed. Sketches produced by random sampling will not, however, have the effect of homogenizing the error variances.
翻译:研究人员可能会使用一个大小为百万美元的数据草图,而不是完整的大小样本,有时用美元来减轻计算负担,有时则用其他时间来维护数据隐私。本文考虑了全面抽样估计需要Eicker-Huber-White严格标准错误来说明三重心动性的情况。我们显示,随机预测对草图数据具有平滑效果,因此使用这种草图数据的最小方位估计值“如”错误是同性恋式的。通过表达从完整样本中计算的时间与草图数据之间的差别,得出这一结果是因为在Hall(1984年)的条件得到满足时,该结果将呈现出一个极低的U$-美元-统计学标准,与同质心动差异无异。这个结果对分析算法和统计属性的两阶段最小方形也存在。然而,通过随机抽样产生的骨架不会产生将误差同化的效果。