It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form. Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. Our main focus is in accommodating high-dimensional binary response data with a small to moderate number of covariates. We propose a two-stage approach for inference on model parameters while taking care of uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.
翻译:收集高维的二元反应数据越来越普遍;例如,随着生态中出现新的取样技术,收集高维的二元反应数据越来越普遍;在较小的维度中,通常使用多变量的probit(MVP)模型来进行推论;然而,由于可能性的易感性,安装这些模型的算法在向高度扩展方面面临着问题,涉及对多变量正常分布的有机组成部分,而没有分析形式。虽然提出了各种算法来接近这一棘手的整体,但这些方法很难执行和/或不准确的高维度。我们的主要重点是用少量至中等的共变量来容纳高维的二元反应数据。我们建议采用两阶段的推论方法来推断模型参数参数,同时注意不同阶段之间的不确定性的传播。我们使用潜伏高频模型的特殊结构来减少联合参数估计所涉及的昂贵的计算,集中推断模型参数的边际分布。这基本上使两种阶段的方法都难以令人尴尬地平行。我们介绍了在模拟和应用联合物种分布模型方面的绩效。