A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.
翻译:暂无翻译