Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters; however, this implies that the neural network's activation function must exhibit a degree of continuity which limits the neural network model's uniform approximation capacity to continuous functions. This paper focuses on the case where the discontinuities arise from distinct sub-patterns, each defined on different parts of the input space. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure that avoids passing gradient updates through the network's only and strategically placed, discontinuous unit. We provide approximation guarantees for our architecture in the space of bounded continuous functions and universal approximation guarantees in the space of piecewise continuous functions which we introduced herein. We present a novel semi-supervised two-step training procedure for our discontinuous deep learning model, tailored to its structure, and we provide theoretical support for its effectiveness. The performance of our model and trained with the propose procedure is evaluated experimentally on both real-world financial datasets and synthetic datasets.
翻译:多数随机梯度梯度下降算法可以优化在参数上可区分的神经网络;然而,这意味着神经网络的激活功能必须表现出一定程度的连续性,从而将神经网络模型的统一近似能力限制在连续功能上。本文件侧重于不同的子模式产生的不连续情况,每个子模式的定义都是输入空间的不同部分。我们建议了一个新的不连续的深层神经网络模型,可以通过分解的两步程序进行训练,避免通过网络的唯一和战略位置、不连续的单元传递梯度更新。我们为我们建筑的封闭连续功能空间提供近似保证,并在我们在此介绍的零星连续功能空间提供普遍近似保证。我们为我们不连续的深层学习模型提出了一个新的半监督的两步培训程序,适合其结构,我们为该模型的有效性提供理论支持。我们模型和经过培训的这一程序的业绩在现实世界金融数据集和合成数据集两方面都进行了实验性评价。