Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.
翻译:基于大语言模型(LLM)的列表重排序方法能够提升检索应用中的顶级结果质量。由于上下文长度限制及长上下文推理成本高昂,重排序通常仅在固定规模的小型子集上进行,最终排序结果由这些局部结果聚合而成。这种固定计算模式忽略了查询难度与文档分布,导致效率低下。本文提出AcuRank——一种自适应重排序框架,该框架基于文档相关性估计的不确定性,动态调整计算量与计算目标。通过采用贝叶斯TrueSkill模型,我们迭代优化相关性估计直至达到足够的置信水平;对排序不确定性的显式建模使我们能够对重排序行为进行原则性控制,并避免对已确信的预测进行不必要的更新。在TREC-DL和BEIR基准测试上的实验结果表明,本方法始终能实现更优的精度-效率权衡,且相比固定计算基线具有更好的计算可扩展性。这些结果凸显了本方法在不同检索任务及基于LLM的重排序模型中的有效性与泛化能力。