Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.
翻译:暂无翻译