Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity. Taking the number of activation regions as a complexity measure, recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work, we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions and when considering the decision boundaries in classification tasks. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity. Finally, we investigate different parameter initialization procedures and show that they can increase the speed of convergence in training.
翻译:神经网络的学习取决于可代表功能的复杂性,但更重要的是,典型参数对不同复杂功能的特殊分配。以激活区域的数量作为复杂的衡量标准,最近的工作表明,深ReLU网络的实际复杂性往往远非理论极限。在这项工作中,我们表明,这种现象也发生在具有最大(多参数)激活功能的网络中,以及在考虑分类任务中的决定界限时。我们还表明,参数空间有许多全维区域,其复杂程度大不相同,并且对预期的复杂度获得了非边际较低的界限。最后,我们调查不同的参数初始化程序,并表明它们能够提高培训的趋同速度。