Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set or a combination of the full and sketched data sets. Previous work has obtained the distribution of these estimators under repeated sketching, along with the first two moments for both estimators. Using a different approach, we also derive the distribution of the complete sketch estimator, but additionally consider the error term under both repeated sketching and sampling. Importantly, we obtain pivotal quantities which are based solely on the sketched data set which specifically not requiring information from the full data model fit. These pivotal quantities can be used for inference on the full data set regression estimates or the model parameters. For partial sketching, we derive pivotal quantities for a marginal test and an approximate distribution for the partial sketch under repeated sketching or repeated sampling, again avoiding reliance on a full data model fit. We extend these results to include the Hadamard and Clarkson-Woodruff sketches then compare them in a simulation study.
翻译:暂无翻译