Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters, which requires their activation function to exhibit a degree of continuity. However, this continuity constraint on the activation function prevents these neural models from uniformly approximating discontinuous functions. In this paper, we focus on the case where the discontinuities arise from distinct sub-patterns, each defined on different parts of the input space. Learning such a function involves identifying the partition of the input space, where each part describes a single continuous sub-pattern of the target function, and then uniformly approximating each of these sub-patterns individually. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure that avoids passing gradient updates through the network's non-differentiable unit. We provide universal approximation guarantees for our architecture. These include a guarantee that its partition component can approximate any partition of the input space in the upper-Kuratowski sense and a guarantee that our architecture is dense in a large non-separable space of discontinuous functions. Quantitative approximation rates and guarantees for the learnability of a performance-optimizing partition are provided. The performance of our architecture is evaluated using the California Housing Market Dataset.
翻译:多数随机梯度梯度下降算法可以优化在参数上可分化的神经网络,这就要求其激活功能表现出一定程度的连续性。然而,激活功能的这种连续性限制使得这些神经模型无法统一接近不连续功能。在本文中,我们侧重于以下情况:不连续现象是由不同的子模式产生的,每个子模式都针对输入空间的不同部分。学习这样一个功能涉及确定输入空间的分区,其中每个部分描述目标功能的单一连续子模式,然后统一地对每个子模式进行单独接近。我们提议一个新的不连续的深神经网络模型,可以通过分解两步程序进行训练,避免通过网络的无差异单位通过梯度更新。我们为我们的结构提供了普遍近距离保证。其中包括保证其分隔部分能够接近上库拉托夫斯基感上方空间的任何分隔,并保证我们的建筑在不连续功能的大型非隔离空间中密度不固定。QOVALI 精确率和保证通过网络无差异单元进行业绩评估。