Many practical problems need the output of a machine learning model to satisfy a set of constraints, $K$. Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set $K$ and any continuous function $f:\mathbb{R}^n\rightarrow K$, there is a probabilistic transformer $\hat{F}$ whose randomized outputs all lie in $K$ and whose expected output uniformly approximates $f$. Our second main result is a "deep neural version" of Berge's Maximum Theorem (1963). The result guarantees that given an objective function $L$, a constraint set $K$, and a family of soft constraint sets, there is a probabilistic transformer $\hat{F}$ that approximately minimizes $L$ and whose outputs belong to $K$; moreover, $\hat{F}$ approximately satisfies the soft constraints. Our results imply the first universal approximation theorem for classical transformers with exact convex constraint satisfaction. They also yield that a chart-free universal approximation theorem for Riemannian manifold-valued functions subject to suitable geodesically convex constraints.
翻译:许多实际问题都需要机器学习模式的输出,以满足一系列限制,即$K美元。然而,没有已知的保证古典神经网络结构能够精确地将制约编码,同时实现普遍性。我们提供了一个数量有限的通用近似理论,保证对于任何非康维克斯契约设定的K美元和任何连续功能,对于任何非康维克斯契约设定的K美元和任何连续功能,有一个概率变压器$hat{F}美元,其随机化产出全部以K美元为单位,其预期产出平均约为$f美元。我们的第二个主要结果就是“卑尔赫最大理论(1963年)的“深神经版 ” 。 其结果保证给一个客观函数$L$, 限制设定了$K美元, 以及一个软约束组合。 一个概率变压器$\hat{F}美元, 其产出大约以K美元为限; 此外, $@hat{F}几乎满足了软性限制。我们的结果还意味着,对于典型的变压机机机机的首个普遍近度直线, 和高压的正压的平基压机能,这些功能也意味着对等压的平平压的平压的平基压。