1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.
翻译:1. 联合物种分布模型(JSDMS)通过环境贡献、生物协会以及可能的空间结构剩余差分来解释社区构成的空间差异,这些模型作为社区生态学和宏观生态学的总体分析框架表现出很大的希望,但它们作为社区生态学和宏观生态学的总体分析框架,但目前的JSDMs,即使被潜在变量所近似,在大型数据集中规模不高,限制了其对当前新兴大型(例如,代谢性废弃物和代谢性遗传学)社区数据集的用处。 在这里,我们展示了一个新的、更可伸缩的JSDM(sjSDM)应用性更强的JSDM(sjS)应用新颖的JSDM(jSD)应用联用蒙特-卡罗洛联合(JSDMDM)应用的精确度变量,从而避免了使用潜在变量的变量变量,使得所有模型组成部分的弹性网络正规化。 我们在PyTSDMDMS中采用了一个现代机器学习框架,利用已知物种群落和不同种类的模拟群落的模拟社区,我们现有的S-DMDMDMS数据分析可以比我们现有的C数据级的大规模S。