It has become increasingly common to collect high-dimensional binary data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form. Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. We propose a two-stage Bayesian approach for inference on model parameters while taking care of the uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.
翻译:收集高维的二元数据已越来越普遍;例如,随着生态领域出现新的取样技术,在较小的层面,通常使用多变量正丙基(MVP)模型来推断;然而,由于可能性的易感性,安装这些模型的算法在向高层面扩展方面遇到了问题,涉及对无分析形式的多变量正常分布进行整体分布,而没有分析形式的多变量正常分布。虽然提出了各种算法来接近这一棘手的整体,但这些方法很难实施和(或)不准确的高层面。我们建议采用两阶段的巴耶西亚方法来推断模型参数,同时注意各个阶段之间的不确定性的传播。我们使用潜伏高斯模型的特殊结构来减少联合参数估计中涉及的昂贵计算,以便集中推断模型参数的边际分布。这基本上使这两个阶段的方法相近于尴尬。我们举例说明了在模拟和应用联合物种分布模型在生态方面的表现。