Extracting low-dimensional summary statistics from large datasets is essential for efficient (likelihood-free) inference. We propose obtaining summary statistics by minimizing the expected posterior entropy (EPE) under the prior predictive distribution of the model. We show that minimizing the EPE is equivalent to learning a conditional density estimator for the posterior as well as other information-theoretic approaches. Further summary extraction methods (including minimizing the $L^2$ Bayes risk, maximizing the Fisher information, and model selection approaches) are special or limiting cases of EPE minimization. We demonstrate that the approach yields high fidelity summary statistics by applying it to both a synthetic benchmark as well as a population genetics problem. We not only offer concrete recommendations for practitioners but also provide a unifying perspective for obtaining informative summary statistics.
翻译:从大型数据集中提取低维摘要统计数据对于高效(无似)推断至关重要。我们提议通过在模型先前的预测分布下尽量减少预期的后子星酶(EPE)来获取简要统计数据。我们表明,将EPE最小化相当于学习后子体的有条件密度估计器和其他信息理论方法。进一步的简要提取方法(包括尽可能降低Bayes风险、最大限度地增加渔业信息以及模型选择方法)是尽量减少EPE的特殊案例或限制案例。我们证明,该方法通过既适用于合成基准,又适用于人口基因问题,产生了高度忠诚的简要统计数据。我们不仅为实践者提供了具体建议,而且还为获取信息性摘要统计数据提供了统一的观点。