Amidst rising appreciation for privacy and data usage rights, researchers have increasingly recognized the principle of data minimization, which holds that the accessibility, collection, and retention of subjects' data should be kept to the minimum necessary to answer focused research questions. Applying this principle to randomized controlled trials (RCTs), this paper presents algorithms for drawing precise inferences from RCTs under stringent data retention and anonymization policies. In particular, we show how to use recursive algorithms to construct running estimates of treatment effects in RCTs, thereby allowing individualized records to be deleted or anonymized shortly after collection. Devoting special attention to the case of non-i.i.d. data, we further demonstrate how to draw robust inferences from RCTs by combining recursive algorithms with bootstrap and federated strategies.
翻译:在对隐私权和数据使用权日益重视的同时,研究人员日益认识到数据最小化原则,认为对对象数据的获取、收集和保留应保持在最起码的必要程度,以回答重点研究问题。将这一原则应用于随机控制的试验(RCTs),本文介绍了在严格的数据保留和匿名政策下从RCT中得出精确推论的算法。特别是,我们展示了如何使用累回算法来构建对RCT治疗效应的运行估计,从而使个人化记录在收集后不久被删除或匿名化。我们特别注意非i.i.d.数据的情况,进一步展示了如何通过将累回算法与陷阱和联合战略相结合,从RCTs得出有力的推论。