Compositional data find broad application across diverse fields due to their efficacy in representing proportions or percentages of various components within a whole. Spatial dependencies often exist in compositional data, particularly when the data represents different land uses or ecological variables. Ignoring the spatial autocorrelations in modelling of compositional data may lead to incorrect estimates of parameters. Hence, it is essential to incorporate spatial information into the statistical analysis of compositional data to obtain accurate and reliable results. However, traditional statistical methods are not directly applicable to compositional data due to the correlation between its observations, which are constrained to lie on a simplex. To address this challenge, the Dirichlet distribution is commonly employed, as its support aligns with the nature of compositional vectors. Specifically, the R package DirichletReg provides a regression model, termed Dirichlet regression, tailored for compositional data. However, this model fails to account for spatial dependencies, thereby restricting its utility in spatial contexts. In this study, we introduce a novel spatial autoregressive Dirichlet regression model for compositional data, adeptly integrating spatial dependencies among observations. We construct a maximum likelihood estimator for a Dirichlet density function augmented with a spatial lag term. We compare this spatial autoregressive model with the same model without spatial lag, where we test both models on synthetic data as well as two real datasets, using different metrics. By considering the spatial relationships among observations, our model provides more accurate and reliable results for the analysis of compositional data. The model is further evaluated against a spatial multinomial regression model for compositional data, and their relative effectiveness is discussed.
翻译:暂无翻译