Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $\boldsymbol{A}$ on $\boldsymbol{Y}$ given $\boldsymbol{C}$. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect improves estimation efficiency of the natural direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $R^2$ between the outcome ($\boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C}$) and total effect ($\boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C}$) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We robustify our estimation procedure to incongenial external information by assuming the total effect follows a random distribution. This framework allows shrinkage towards the external information if the total effects in the internal and external populations agree. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.
翻译:暂无翻译