Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they are almost always overparameterized and challenging to interpret due to their internal complexity. Furthermore, the optimization process to find the learned model parameters can be unstable due to the process getting stuck in local minima. In this work, we demonstrate the value of sparse regularization techniques to significantly reduce the model complexity. We demonstrate this for the case of an aluminium extraction process, which is highly nonlinear system with many interrelated subprocesses. We trained a densely connected deep neural network to model the process and then compared the effects of sparsity promoting l1 regularization on generalizability, interpretability, and training stability. We found that the regularization significantly reduces model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable, and show that training an ensemble of sparse neural networks with different parameter initializations often converges to similar model structures with similar learned input features. Furthermore, the empirical study shows that the resulting sparse models generalize better from small training sets than their dense counterparts.
翻译:在模拟复杂的非线性进程时,深神经网络变得非常流行,因为它们非常有能力从数据中配置任意的非线性功能,而专家干预极少。然而,由于内部复杂,这些网络几乎总是过分分解,难以解释。此外,由于过程被卡在本地微型,找到所学模型参数的优化过程可能不稳定。在这项工作中,我们展示了稀疏的正规化技术的价值,以大大降低模型的复杂性。我们用铝提取过程的例子证明了这一点,这是一个高度非线性系统,有许多相互关联的子进程。我们训练了一个紧密相连的深神经网络,以模拟过程,然后比较了微气化的作用。我们发现,与相应的密集神经网络相比,这种正规化大大降低了模型的复杂性。我们说,这使得模型更容易被解释,并且表明,具有不同参数初始化的微弱神经网络的集合往往会与具有相似的学习性输入特征的类似模型结构汇合在一起。此外,实证研究表明,由此形成的微小模型比其密度对应的密度要好得多。