We propose a generalisation of the logistic regression model, that aims to account for non-linear main effects and complex interactions, while keeping the model inherently explainable. This is obtained by starting with log-odds that are linear in the covariates, and adding non-linear terms that depend on at least two covariates. More specifically, we use a generative specification of the model, consisting of a combination of certain margins on natural exponential form, combined with vine copulas. The estimation of the model is however based on the discriminative likelihood, and dependencies between covariates are included in the model, only if they contribute significantly to the distinction between the two classes. Further, a scheme for model selection and estimation is presented. The methods described in this paper are implemented in the R package LogisticCopula. In order to assess the performance of our model, we ran an extensive simulation study. The results from the study, as well as from a couple of examples on real data, showed that our model performs at least as well as natural competitors, especially in the presence of non-linearities and complex interactions, even when $n$ is not large compared to $p$.
翻译:暂无翻译