Neural networks have shown remarkable success, especially in overparameterized or "large" models. Despite increasing empirical evidence and intuitive understanding, a formal mathematical justification for the behavior of such models, particularly regarding overfitting, remains incomplete. In this paper, we propose a general regularization framework to study the Mean Integrated Squared Error (MISE) of neural networks. This framework includes many commonly used neural networks and penalties, such as ReLu and Sigmoid activations and $L^1$, $L^2$ penalties. Based on our frameworks, we find the MISE curve has two possible shapes, namely the shape of double descents and monotone decreasing. The latter phenomenon is new in literature and the causes of these two phenomena are also studied in theory. These studies challenge conventional statistical modeling frameworks and broadens recent findings on the double descent phenomenon in neural networks.
翻译:暂无翻译