We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d. data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number, and more. For primality testing, our theorem shows the following. Suppose that we draw an i.i.d. sample of $\Theta(N^{\delta}\ln N)$ numbers uniformly at random from $1$ to $N$, where $\delta\in (0,1)$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then with high probability, the MDL network fitted to this data accurately answers whether a newly drawn number between $1$ and $N$ is a prime or not, with test error $\leq O(N^{-\delta})$. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so.
翻译:暂无翻译