We study the family of functions that are represented by a linear convolutional neural network (LCN). These functions form a semi-algebraic subset of the set of linear maps from input space to output space. In contrast, the families of functions represented by fully-connected linear networks form algebraic sets. We observe that the functions represented by LCNs can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the resulting function space. We further study the optimization of an objective function over an LCN, analyzing critical points in function space and in parameter space, and describing dynamical invariants for gradient descent. Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers, or filters that can be decomposed as repeated filters. We also conduct numerical and symbolic experiments that illustrate our results and present an in-depth analysis of the landscape for small architectures.
翻译:我们研究的是由线性进化神经网络(LCN)代表的函数组。这些函数构成从输入空间到输出空间的一组线性地图的半数值子集。相反,由完全连接的线性网络代表的函数组形成代数组。我们观察到,LCN所代表的函数组可以与承认某些因子化的多元性函数组识别,我们从这个角度来描述网络结构对由此形成的功能空间的几何测量的影响。我们进一步研究如何优化LCN的客观功能组,分析功能空间和参数空间的关键点,并描述梯度下降的动态异变。总体而言,我们的理论预测,LCN的优化参数将常常与层间反复的过滤器或可分解的过滤器相对应。我们还进行数字和象征性实验,以说明我们的结果,并对小型结构的景观进行深入分析。