Principal Component Analysis (PCA) is one of the most commonly used statistical methods for data exploration, and for dimensionality reduction wherein the first few principal components account for an appreciable proportion of the variability in the data. Less commonly, attention is paid to the last principal components because they do not account for an appreciable proportion of variability. However, this defining characteristic of the last principal components also qualifies them as combinations of variables that are constant across the cases. Such constant-combinations are important because they may reflect underlying laws of nature. In situations involving a large number of noisy covariates, the underlying law may not correspond to the last principal component, but rather to one of the last. Consequently, a criterion is required to identify the relevant eigenvector. In this paper, two examples are employed to demonstrate the proposed methodology; one from Physics, involving a small number of covariates, and another from Meteorology wherein the number of covariates is in the thousands. It is shown that with an appropriate selection criterion, PCA can be employed to ``discover" Kepler's third law (in the former), and the hypsometric equation (in the latter).
翻译:暂无翻译