Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional flexibility on the possible types of features to be considered. This flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modified mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.
翻译:在一系列应用中使用回归模型,为不同领域的研究人员提供强大的科学工具。线性或简单的参数性模型往往不足以描述投入变量和反应之间的复杂关系。这种关系可以通过神经网络等灵活方法更好地描述,但这种关系可以通过神经网络等灵活方法更好地描述,但结果会导致解释性不甚强的模式和可能的过度配置。或者,可以使用具体的参数性非线性功能,但这类功能的规格一般而言比较复杂。在本文件中,我们采用一种灵活的方法,用于构建和选择高度灵活的非线性参数性回归模型。非线性特征生成等级,类似于深层学习,但在可能考虑的特征类型上具有更大的灵活性。这种灵活性,加上变量选择,使我们能够找到一小部分重要特征,从而有更多的可解释的模式。在可能的功能范围内,可以考虑采用一种贝耶斯法,根据这些功能的复杂性引入各种功能的前奏。我们采用了一种转基因改变的模式,跳过Markov链 Monte Carlo 算法,以进行Bayesian 的推断,并估计模型的近似性概率。在各种应用中,我们比较了几种模型,我们如何进行有意义的演算。我们如何利用机器演算。