Despite their empirical success, most existing listwiselearning-to-rank (LTR) models are not built to be robust to errors in labeling or annotation, distributional data shift, or adversarial data perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR). Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. DRMRR uses a Distributionally Robust Optimization (DRO) framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We show that this is equivalent to a regularized regression problem with a matrix norm regularizer. Our experiments were conducted on two real-world applications, medical document retrieval, and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted a comprehensive analysis to assess the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. We show that DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data.
翻译:尽管取得了实证成功,但大多数现有列表式从学习到排名(LTR)模型的构建并不是要对标签或批注、分布式数据转换或对称数据扰动方面的错误进行稳健。为填补这一空白,我们引入了一个新的列表式LTR模型,名为“分布式强压多输出递减排名 ” (DRMRR)。与现有方法不同,DRMRR的评分功能设计为从特性矢量到偏差分矢量的多变量映射,它捕捉到本地背景信息和交叉文件互动。DRRMR使用分布式机械化优化(DRO)框架,在瓦塞斯坦球定义的经验性数据分布最不利的分布下,最大限度地减少多产出损失功能。我们显示这相当于一个固定回归问题,采用矩阵规范调节器。 我们的评分功能是在两种真实世界应用程序上进行,即医疗文件检索和药物反应预测,显示DRRRRM明显超越了最先进的LTR模型。我们还进行了一项全面的分析,而没有比DRMRM更能用来评估各种稳定的DRM的稳定性。