Combining many cross-sectional return predictors (for example, in machine learning) often requires imputing missing values. We compare ad-hoc mean imputation with several methods including maximum likelihood. Surprisingly, maximum likelihood and ad-hoc methods lead to similar results. This is because predictors are largely independent: Correlations cluster near zero and 10 principal components (PCs) span less than 50% of total variance. Independence implies observed predictors are uninformative about missing predictors, making ad-hoc methods valid. In PC regression tests, 50 PCs are required to capture equal-weighted expected returns (30 PCs value-weighted), regardless of the imputation. We find similar invariance in neural network portfolios.
翻译:暂无翻译