Recent dense retrievers increasingly leverage the robust text understanding capabilities of Large Language Models (LLMs), encoding queries and documents into a shared embedding space for effective retrieval. However, most existing methods represent each document with a single embedding, which is less effective at capturing its multifaceted semantics and thereby limits matching accuracy. In this paper, we propose Deliberate Thinking based Dense Retriever (Debater), a novel approach that enhances document representations by incorporating a step-by-step thinking process. Debater introduces a Chain-of-Deliberation mechanism, which iteratively refines document embeddings through a continuous chain-of-thought. To integrate information from various thinking steps, Debater further employs a Self Distillation mechanism that identifies and fuses the most informative steps into a unified embedding. Experimental results show that Debater significantly outperforms existing methods across several retrieval benchmarks, demonstrating superior accuracy and robustness. All codes and datasets are available at https://github.com/OpenBMB/DEBATER.
翻译:暂无翻译