Epidemiological evidence suggests that simultaneous exposures to multiple environmental risk factors (Es) can increase disease risk larger than the additive effect of individual exposure acting alone. The interaction between a gene and multiple Es on a disease risk is termed as synergistic gene-environment interactions (synG$\times$E). Varying multi-index coefficients models (VMICM) have been a promising tool to model synergistic G$\times$E effect and to understand how multiple Es jointly influence genetic risks on a disease outcome. In this work, we proposed a 3-step variable selection approach for VMICM to estimate different effects of gene variables: varying, non-zero constant and zero effects which respectively correspond to nonlinear synG$\times$E, no synG$\times$E and no genetic effect. For multiple environmental exposure variables, we also estimated and selected important environmental variables that contribute to the synergistic interaction effect. We theoretically evaluated the oracle property of the proposed variable selection approach. Extensive simulation studies were conducted to evaluate the finite sample performance of the method, considering both continuous and discrete gene variables. Application to a real dataset further demonstrated the utility of the method. Our method has broad applications in areas where the purpose is to identify synergistic interaction effect.
翻译:流行病学证据表明,同时接触多种环境风险因素(Es)会增加疾病风险,其程度大于个人单独接触的附加效应。基因和多重Es之间对疾病风险的相互作用被称为协同基因-环境相互作用(synG$\times$E)。不同多指数系数模型(VMIM)是模拟协同G$/times$E效应和了解多重Es对疾病结果的遗传风险共同影响的一个很有希望的工具。在这项工作中,我们建议VMIM采用三步变量选择方法来估计基因变量的不同影响:不同、非零常数和零效应分别与非线性合成G$\times$E,没有辛G$\timesE,没有遗传效应。对于多种环境接触变量,我们还估计并选择了有助于协同互动效应的重要环境变量。我们从理论上评估了拟议变量选择方法的特性。我们进行了广泛的模拟研究,以评估该方法的有限样本性能,同时考虑到连续和离散基因变量变量变量。在实际数据应用中进一步展示了我们的方法的效用。