Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.
翻译:在现代大数据应用中,由于其稳健性和灵活性,量回归日益成为现代大数据应用中遇到的问题。我们认为,如果现有数据可能超出目标范围,从可能与目标有相似之处的其他来源得到补充,则学习特定目标人群有条件的四分位数的情况会越来越多。一个关键问题是如何适当区分和利用其他来源的有用信息,以改进目标的四分位估计和推论。我们开发了高维四分位回归的转移学习方法,方法是发现模型与目标相近的信息源,并利用它们改进目标模型。我们表明,在合理条件下,发现基于样本分离的信息源是一致的。与仅有目标数据的天真天真天真天真估计数字相比,转移学习天真天真天真天真,该天真天真天真天真天真天真天真地假率要低得多,取决于抽样大小、信号到噪音比率以及目标模型和来源模型的相似性衡量尺度。广泛的模拟研究显示了我们拟议方法的优越性。我们运用了各种方法来解决在飞行安全方面发现硬着陆风险的问题,并展示了从不同类型BA和BSERM7学会了三个飞机的收益和洞察。