In the era of big data, integrating multi-source functional data to extract a subspace that captures the shared subspace across sources has attracted considerable attention. In practice, data collection procedures often follow source-specific protocols. Directly averaging sample covariance operators across sources implicitly assumes homogeneity, which may bias the recovery of both shared and source-specific variation patterns. To address this issue, we propose a projection-based data integration method that explicitly separates the shared and source-specific subspaces. The method first estimates source-specific projection operators via smoothing to accommodate the nonparametric nature of functional data. The shared subspace is then isolated by examining the eigenvalues of the averaged projection operator across all sources. If a source-specific subspace is of interest, we re-project the associated source-specific covariance estimator onto the subspace orthogonal to the estimated shared subspace, and estimate the source-specific subspace from the resulting projection. We further establish the asymptotic properties of both the shared and source-specific subspace estimators. Extensive simulation studies demonstrate the effectiveness of the proposed method across a wide range of settings. Finally, we illustrate its practical utility with an example of air pollutant data.
翻译:暂无翻译