We consider quantile estimation in a semi-supervised setting, characterized by two available data sets: (i) a small or moderate sized labeled data set containing observations for a response and a set of possibly high dimensional covariates, and (ii) a much larger unlabeled data set where only the covariates are observed. We propose a family of semi-supervised estimators for the response quantile(s) based on the two data sets, to improve the estimation accuracy compared to the supervised estimator, i.e., the sample quantile from the labeled data. These estimators use a flexible imputation strategy applied to the estimating equation along with a debiasing step that allows for full robustness against misspecification of the imputation model. Further, a one-step update strategy is adopted to enable easy implementation of our method and handle the complexity from the non-linear nature of the quantile estimating equation. Under mild assumptions, our estimators are fully robust to the choice of the nuisance imputation model, in the sense of always maintaining root-n consistency and asymptotic normality, while having improved efficiency relative to the supervised estimator. They also attain semi-parametric optimality if the relation between the response and the covariates is correctly specified via the imputation model. As an illustration of estimating the nuisance imputation function, we consider kernel smoothing type estimators on lower dimensional and possibly estimated transformations of the high dimensional covariates, and we establish novel results on their uniform convergence rates in high dimensions, involving responses indexed by a function class and usage of dimension reduction techniques. These results may be of independent interest. Numerical results on both simulated and real data confirm our semi-supervised approach's improved performance, in terms of both estimation and inference.
翻译:我们在一个半监督的环境下考虑量化估算,其特点是两个可用的数据集:(一) 一个小型或中度的标签标签数据集,包含对响应的观测和一套可能高维的共变体,以及(二) 一个大得多的无标签数据集,其中只观察到共变体。我们建议基于两个数据集,为响应量的半监督估算器组合,以便与受监督的估算器相比,提高估算的准确性,即标签数据中的样本量。这些估算器使用一个小型或中度的标准化标签数据集,包含对响应的观测和一套可能高维度的共变异性;以及(二) 一个大得多的无标签数据集,其中只看到共变异性。我们采用了一个一小步的更新战略,以方便地执行我们的方法,并处理这些量化模型的非线性估算方程式的复杂性。在轻度的假设下,我们的估算器度估算器将完全稳健地用于选择调试的调值调和调试的模型。这些估算器使用灵活的估算策略,在估算公式中应用弹性的快速的推算,同时,同时保持正度的推定的精确性变异性变变变值,同时,同时保持了正常的精确的计算,同时测量性反应,同时测量性反应,同时保持了正常的计算,同时保持了正常的计算,同时测量性反应,同时保持了正常的精确性能的计算,同时测量性反应,同时也保持了正常的计算,同时保持了正常的稳定性的计算。