Variable selection methods are required in practical statistical modeling, to identify and include only the most relevant predictors, and then improving model interpretability. Such variable selection methods are typically employed in regression models, for instance in this article for the Poisson Log Normal model (PLN, Chiquet et al., 2021). This model aim to explain multivariate count data using dependent variables, and its utility was demonstrating in scientific fields such as ecology and agronomy. In the case of the PLN model, most recent papers focus on sparse networks inference through combination of the likelihood with a L1 -penalty on the precision matrix. In this paper, we propose to rely on a recent penalization method (SIC, O'Neill and Burke, 2023), which consists in smoothly approximating the L0-penalty, and that avoids the calibration of a tuning parameter with a cross-validation procedure. Moreover, this work focuses on the coefficient matrix of the PLN model and establishes an inference procedure ensuring effective variable selection performance, so that the resulting fitted model explaining multivariate count data using only relevant explanatory variables. Our proposal involves implementing a procedure that integrates the SIC penalization algorithm (epsilon-telescoping) and the PLN model fitting algorithm (a variational EM algorithm). To support our proposal, we provide theoretical results and insights about the penalization method, and we perform simulation studies to assess the method, which is also applied on real datasets.
翻译:暂无翻译