Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters, which requires their activation function to exhibit a degree of continuity. However, this continuity constraint on the activation function prevents these neural models from uniformly approximating discontinuous functions. This paper focuses on the case where the discontinuities arise from distinct sub-patterns, each defined on different parts of the input space. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure that avoids passing gradient updates through the network's non-differentiable unit. We provide universal approximation guarantees for our architecture in the space of bounded continuous functions and in the space of piecewise continuous functions, which we introduced herein. We present a novel semi-supervised two-step training procedure for our discontinuous deep learning model, and we provide theoretical support for its effectiveness. The performance of our architecture is evaluated experimentally on two real-world datasets and one synthetic dataset.
翻译:大多数随机梯度梯度下降算法可以优化在参数上可分化的神经网络,这就要求其激活功能表现出一定程度的连续性。然而,对激活功能的这种连续性限制使这些神经模型无法统一地接近不连续功能。本文件侧重于不同的亚模式造成的不连续性,每个亚模式都针对输入空间的不同部分。我们建议一种新的不连续的深层神经网络模型,通过分解的两步程序进行训练,避免通过网络的不可区分单位传递梯度更新。我们为在捆绑连续功能空间和我们在此介绍的片断连续功能空间的建筑提供了普遍近似保证。我们为我们不连续的深层学习模型提出了一个新的半监督双步培训程序,并为它的有效性提供了理论支持。我们建筑的性能是通过两个真实世界数据集和一个合成数据集进行实验性评估的。