The classification of random objects within metric spaces without a vector structure has attracted increasing attention. However, the complexity inherent in such non-Euclidean data often restricts existing models to handle only a limited number of features, leaving a gap in real-world applications. To address this, we propose a data-adaptive filtering procedure to identify informative features from large-scale random objects, leveraging a novel Kolmogorov-Smirnov-type statistic defined on the metric space. Our method, applicable to data in general metric spaces with binary labels, exhibits remarkable flexibility. It enjoys a model-free property, as its implementation does not rely on any specified classifier. Theoretically, it controls the false discovery rate while guaranteeing the sure screening property. Empirically, equipped with a Wasserstein metric, it demonstrates superior sample performance compared to Euclidean competitors. When applied to analyze a dataset on autism, our method identifies significant brain regions associated with the condition. Moreover, it reveals distinct interaction patterns among these regions between individuals with and without autism, achieved by filtering hundreds of thousands of covariance matrices representing various brain connectivities.
翻译:暂无翻译