As novel data collection becomes increasingly common, traditional dimension reduction and data visualization techniques are becoming inadequate to analyze these complex data. A surrogate-assisted sufficient dimension reduction (SDR) method for regression with a general metric-valued response on Euclidean predictors is proposed. The response objects are mapped to a real-valued distance matrix using an appropriate metric and then projected onto a large sample of random unit vectors to obtain scalar-valued surrogate responses. An ensemble estimate of the subspaces for the regression of the surrogate responses versus the predictor is used to estimate the original central space. Under this framework, classical SDR methods such as ordinary least squares and sliced inverse regression are extended. The surrogate-assisted method applies to responses on compact metric spaces including but not limited to Euclidean, distributional, and functional. An extensive simulation experiment demonstrates the superior performance of the proposed surrogate-assisted method on synthetic data compared to existing competing methods where applicable. The analysis of the distributions and functional trajectories of county-level COVID-19 transmission rates in the U.S. as a function of demographic characteristics is also provided. The theoretical justifications are included as well.
翻译:暂无翻译