In real world applications dealing with compositional datasets, it is easy to face the presence of structural zeros. The latter arise when, due to physical limitations, one or more variables are intrinsically zero for a subset of the population under study. The classical Aitchison approach requires all the components of a composition to be strictly positive, since the adaptation of the most widely used statistical techniques to the compositional framework relies on computing the logratios of these components. Therefore, datasets containing structural zeros are usually split in two subsets, the one containing the observations with structural zeros and the one containing all the other data. Then statistical analysis is performed on the two subsets separately, assuming the two datasets are drawn from two different subpopulations. However, this approach may lead to incomplete results when the split into two populations is merely artificial. To overcome this limitation and increase the robustness of such an approach, we introduce a statistical test to check whether the first K principal components of the two datasets generate the same vector space. An approximation of the corresponding null distribution is derived analytically when data are normally distributed on the simplex and through a nonparametric bootstrap approach in the other cases. Results from simulated data demonstrate that the proposed procedure can discriminate scenarios where the subpopulations share a common subspace from those where they are actually distinct. The performance of the proposed method is also tested on an experimental dataset concerning microbiome measurements.
翻译:暂无翻译