Biological sequencing data consist of read counts, e.g. of specified taxa and often exhibit sparsity (zero-count inflation) and overdispersion (extra-Poisson variability). As most sequencing techniques provide an arbitrary total count, taxon-specific counts should ideally be treated as proportions under the compositional data-analytic framework. There is increasing interest in the role of the gut microbiome composition in mediating the effects of different exposures on health outcomes. Most previous approaches to compositional mediation have addressed the problem of identifying potentially mediating taxa among a large number of candidates. We here consider causal inference in compositional mediation when a priori knowledge is available about the hierarchy for a restricted number of taxa, building on a single hypothesis structured in terms of contrasts between appropriate sub-compositions. Based on the theory on multiple contemporaneous mediators and the assumed causal graph, we define non-parametric estimands for overall and coordinate-wise mediation effects, and show how these indirect effects can be estimated from empirical data based on simple parametric linear models. The mediators have straightforward and coherent interpretations, related to specific causal questions about the interrelationships between the sub-compositions. We perform a simulation study focusing on the impact of sparsity and overdispersion on estimation of mediation. While unbiased, the precision of the estimators depends, for any given magnitude of indirect effect, on sparsity and the relative magnitudes of exposure-to-mediator and mediator-to-outcome effects in a complex manner. We demonstrate the approach on empirical data, finding an inverse association of fibre intake on insulin level, mainly attributable to direct rather than indirect effects.
翻译:暂无翻译