Pre-training and fine-tuning have achieved significant advances in the information retrieval (IR). A typical approach is to fine-tune all the parameters of large-scale pre-trained models (PTMs) on downstream tasks. As the model size and the number of tasks increase greatly, such approach becomes less feasible and prohibitively expensive. Recently, a variety of parameter-efficient tuning methods have been proposed in natural language processing (NLP) that only fine-tune a small number of parameters while still attaining strong performance. Yet there has been little effort to explore parameter-efficient tuning for IR. In this work, we first conduct a comprehensive study of existing parameter-efficient tuning methods at both the retrieval and re-ranking stages. Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters. More importantly, we find that the existing methods are just parameter-efficient, but not learning-efficient as they suffer from unstable training and slow convergence. To analyze the underlying reason, we conduct a theoretical analysis and show that the separation of the inserted trainable modules makes the optimization difficult. To alleviate this issue, we propose to inject additional modules alongside the \acp{PTM} to make the original scattered modules connected. In this way, all the trainable modules can form a pathway to smooth the loss surface and thus help stabilize the training process. Experiments at both retrieval and re-ranking stages show that our method outperforms existing parameter-efficient methods significantly, and achieves comparable or even better performance over full fine-tuning.
翻译:培训前和微调在信息检索方面取得了显著进展。 典型的做法是在下游任务中微调大规模预先培训的模型(PTMs)的所有参数。 随着模型规模和任务数量的大量增加,这种方法变得不那么可行,而且费用太高。 最近,在自然语言处理(NLP)中提出了各种参数效率调控方法,这些方法仅微调少量参数,同时仍能取得强劲的绩效。然而,在为IRS探索高水平的参数效率调控方面没有做多少努力。在这项工作中,我们首先在回收和重排阶段对现有参数效率调控的参数调控方法进行全面研究。与NLP的有希望的结果不同,我们发现这些方法无法在更新不到原有模型参数的1个百分点时在两个阶段实现完全的调整。 更重要的是,我们发现现有方法仅仅具有参数效率,但是没有学习效率,因为它们受到不稳定的培训和缓慢的趋同。 为了分析基本原因,我们进行了一个精细的理论分析,并显示现有参数效率调校准的模型的分解方法使得所有可升级的模件都难以进行完全的升级。