We study semiparametric factor models in high-dimensional panels where the factor loadings consist of a nonparametric component explained by observed covariates and an idiosyncratic component capturing unobserved heterogeneity. A key challenge in empirical applications is the presence of missing observations, which can distort both factor recovery and loading estimation. To address this issue, we develop a projected principal component analysis (PPCA) procedure that accommodates general missing-at-random mechanisms through inverse-probability weighting. We establish consistency and derive the asymptotic distributions of the estimated factors and loading functions, allowing the sieve dimension to diverge and permitting the time dimension to be either fixed or growing. Unlike classical PCA, PPCA achieves consistent factor estimation even when T is fixed, and the limiting distributions under missing data exhibit mixture normality with enlarged asymptotic variances. Theoretical results are supported by simulations and an empirical application. Our findings demonstrate that PPCA provides an effective and robust framework for estimating semiparametric factor models in the presence of missing data.
翻译:我们研究了高维面板数据中的半参数因子模型,其中因子载荷包含由观测协变量解释的非参数成分和捕捉未观测异质性的特异成分。在实证应用中,一个关键挑战是缺失观测的存在,这会扭曲因子恢复和载荷估计。为解决这一问题,我们开发了一种投影主成分分析(PPCA)方法,该方法通过逆概率加权适应一般的随机缺失机制。我们建立了估计因子和载荷函数的一致性,并推导了其渐近分布,允许基函数维数发散,且时间维度可以是固定的或增长的。与经典PCA不同,PPCA即使在T固定时也能实现一致的因子估计,且缺失数据下的极限分布呈现混合正态性,并具有增大的渐近方差。理论结果得到了模拟和实证应用的支持。我们的研究结果表明,PPCA为存在缺失数据时估计半参数因子模型提供了一个有效且稳健的框架。