This paper describes the approach of the THUIR team at the WSDM Cup 2023 Pre-training for Web Search task. This task requires the participant to rank the relevant documents for each query. We propose a new data pre-processing method and conduct pre-training and fine-tuning with the processed data. Moreover, we extract statistical, axiomatic, and semantic features to enhance the ranking performance. After the feature extraction, diverse learning-to-rank models are employed to merge those features. The experimental results show the superiority of our proposal. We finally achieve second place in this competition.
翻译:本文介绍了THUIR团队在WSDM Cup 2023 Web搜索任务预培训中的做法。 这项任务要求参与者为每个查询确定相关文件的排名。 我们提出了一个新的数据预处理方法,并对处理过的数据进行预处理和微调。 此外,我们提取统计、不言理和语义特征来提高排名表现。 在特征提取后,采用不同的学习到排位模型来合并这些特征。实验结果显示了我们提案的优势。 我们最终在这场竞争中获得了第二位。</s>