When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
翻译:暂无翻译