In multi-center clinical trials, due to various reasons, the individual-level data are strictly restricted to be assessed publicly. Instead, the summarized information is widely available from published results. With the advance of computational technology, it has become very common in data analyses to run on hundreds or thousands of machines simultaneous, with the data distributed across those machines and no longer available in a single central location. How to effectively assemble the summarized clinical data information or information from each machine in parallel computation has become a challenging task for statisticians and computer scientists. In this paper, we selectively review some recently-developed statistical methods, including communication efficient distributed statistical inference, and renewal estimation and incremental inference, which can be regarded as the latest development of calibration information methods in the era of big data. Even though those methods were developed in different fields and in different statistical frameworks, in principle, they are asymptotically equivalent to those well known methods developed in meta analysis. Almost no or little information is lost compared with the case when full data are available. As a general tool to integrate information, we also review the generalized method of moments and estimating equations approach by using empirical likelihood method.
翻译:在多中心临床试验中,由于各种原因,个人数据严格限于公开评估。相反,摘要信息从公布的结果中可以广泛获得。随着计算技术的进步,在数据分析中它变得非常普遍,可以同时运行数百或数千台机器,数据分布于这些机器,不再在同一中央地点提供。如何有效地收集摘要临床数据信息或每个机器的平行计算信息,已成为统计人员和计算机科学家的一项艰巨任务。在本文件中,我们有选择地审查最近制定的一些统计方法,包括通信高效分布式统计推断、更新估计和增量推断,这些方法可被视为大数据时代校准信息方法的最新发展。即使这些方法是在不同的领域和不同的统计框架内开发的,原则上也与元分析中开发的众所周知的方法一样。在获得全部数据时,与案例相比,几乎没有或几乎没有丢失任何信息。作为综合信息的一般工具,我们还利用经验性可能性方法审查时空和估计方程方法。