Data aggregation, also known as meta analysis, is widely used to combine knowledge on parameters shared in common (e.g., average treatment effect) between multiple studies. In this paper, we introduce an attractive data aggregation scheme that pools summary statistics from various existing studies. Our scheme informs the design of new validation studies and yields us unbiased estimators for the shared parameters. In our setup, each existing study applies a LASSO regression to select a parsimonious model from a large set of covariates. It is well known that post-hoc estimators, in the selected model, tend to be biased. We show that a novel technique called \textit{data carving} yields us a new unbiased estimator by aggregating simple summary statistics from all existing studies. Our estimator has two key features: (a) we make the fullest possible use of data, from all studies, without the risk of bias from model selection; (b) we enjoy the added benefit of individual data privacy, because raw data from these studies need not be shared or stored for efficient estimation.
翻译:暂无翻译