Compositional data arise when count observations are normalised into proportions adding up to unity. To allow use of standard statistical methods, compositional proportions can be mapped from the simplex into the Euclidean space through the isometric log-ratio (ilr) transformation. When the counts follow a multinomial distribution with fixed class-specific probabilities, the distribution of the ensuing ilr coordinates has been shown to be asymptotically multivariate normal. We here derive an asymptotic normal approximation to the distribution of the ilr coordinates when the counts show overdispersion under the Dirichlet-multinomial mixture model. Using a simulation study, we then investigate the practical applicability of the approximation against the empirical distribution of the ilr coordinates under varying levels of extra-multinomial variation and the total count. The approximation works well, except with a small total count or high amount of overdispersion. These empirical results remain even under population-level heterogeneity in the total count. Our work is motivated by microbiome data, which often exhibit considerable extra-multinomial variation and are increasingly treated as compositional through scaling taxon-specific counts into proportions. We conclude that if the analysis of empirical data relies on normality of the ilr coordinates, it may be advisable to choose a taxonomic level where counts are less sparse so that the distribution of taxon-specific class probabilities remains unimodal.
翻译:暂无翻译