复杂、基于协调的元分析与概率方案编制 (Complex Coordinate-Based Meta-Analysis with Probabilistic Programming)

With the growing number of published functional magnetic resonance imaging (fMRI) studies, meta-analysis databases and models have become an integral part of brain mapping research. Coordinate-based meta-analysis (CBMA) databases are built by automatically extracting both coordinates of reported peak activations and term associations using natural language processing (NLP) techniques. Solving term-based queries on these databases make it possible to obtain statistical maps of the brain related to specific cognitive processes. However, with tools like Neurosynth, only singleterm queries lead to statistically reliable results. When solving richer queries, too few studies from the database contribute to the statistical estimations. We design a probabilistic domain-specific language (DSL) standing on Datalog and one of its probabilistic extensions, CP-Logic, for expressing and solving rich logic-based queries. We encode a CBMA database into a probabilistic program. Using the joint distribution of its Bayesian network translation, we show that solutions of queries on this program compute the right probability distributions of voxel activations. We explain how recent lifted query processing algorithms make it possible to scale to the size of large neuroimaging data, where state of the art knowledge compilation (KC) techniques fail to solve queries fast enough for practical applications. Finally, we introduce a method for relating studies to terms probabilistically, leading to better solutions for conjunctive queries on smaller databases. We demonstrate results for two-term conjunctive queries, both on simulated meta-analysis databases and on the widely-used Neurosynth database.

翻译：随着已公布的功能磁共振成像(fMRI)研究数量的不断增加,元分析数据库和模型已成为大脑绘图研究的一个组成部分。基于协调的元分析数据库通过自动提取所报告的峰值激活和术语关联的坐标,使用自然语言处理技术(NLP),建立基于协调的元分析数据库。在这些数据库中解决基于术语的查询使得有可能获得与具体认知过程有关的大脑统计地图。然而,利用Neurosynth等工具,只有单期查询才能得出统计上可靠的结果。当解决更丰富的查询时,数据库中只有很少的研究有助于统计估计。我们设计一种基于数据的概率的域特定语言(DSL),在数据仪上和其概率扩展的扩展之一,即CP-Logic,用于表达和解决丰富的基于逻辑的查询。我们将CBMA数据库编码成一个与具体认知过程有关的预测程序。使用其Bayesian网络翻译的联合分布,我们展示了这个程序查询的解决方案的解决方案可以计算到大量氧化物的正确概率分布。我们解释了最近如何解解剖的神经级数据解算方法,最终可以用来将数据推算成一个快速的系统。我们是如何在快速解方法上进行快速解解算,我们是如何在快速解算算法的。