Stability selection represents an attractive approach to identify sparse sets of features jointly associated with an outcome in high-dimensional contexts. We introduce an automated calibration procedure via maximisation of an in-house stability score and accommodating a priori-known block structure (e.g. multi-OMIC) data. It applies to (LASSO) penalised regression and graphical models. Simulations show our approach outperforms non-stability-based and stability selection approaches using the original calibration. Application of multi-block graphical LASSO on real (epigenetic and transcriptomic) data from the Norwegian Women and Cancer study reveals a central/credible and novel cross-OMIC role of LRRN3 in the biological response to smoking. Proposed approaches were implemented in the R package sharp.
翻译:稳定选择是一种具有吸引力的方法,用以确定与高维环境中的结果共同相关的零星特征。我们采用自动化校准程序,通过实现内部稳定评分最大化,并容纳一个先行已知的区块结构(例如多奥米卡)数据。它适用于(LASSO)惩罚性回归模型和图形模型。模拟显示我们的方法优于使用原始校准的不稳定性和稳定性选择方法。将多块图形LASSO(LASSO)应用到实际(遗传学和记录学)数据中,挪威妇女与癌症研究显示LRRN3在生物应对吸烟方面的核心/可信的和新的跨奥米卡作用。拟议方法已在R包中实施。