Time series of counts occurring in various applications are often overdispersed, meaning their variance is much larger than the mean. This paper proposes a novel variable selection approach for processing such data. Our approach consists in modelling them using sparse negative binomial GLARMA models. It combines estimating the autoregressive moving average (ARMA) coefficients of GLARMA models and the overdispersion parameter with performing variable selection in regression coefficients of Generalized Linear Models (GLM) with regularised methods. We describe our three-step estimation procedure, which is implemented in the NBtsVarSel package. We evaluate the performance of the approach on synthetic data and compare it to other methods. Additionally, we apply our approach to RNA sequencing data. Our approach is computationally efficient and outperforms other methods in selecting variables, i.e. recovering the non-null regression coefficients.
翻译:暂无翻译