Retrieval-augmented generation improves large language models by grounding outputs in external knowledge sources, reducing hallucinations and addressing knowledge cutoffs. However, standard embedding-based retrieval fails to capture the complexity of multi-concept queries, particularly in domains like biomedicine, where biological data are inherently high-dimensional. For example,omics datasets, and clinical reports simultaneously exhibit numerous molecular, cellular, and physiological features. We present Stochastic Latent Matching (STHLM), a generative vector search method that samples query-conditioned embeddings from text or image inputs to enhance retrieval performance. Analogous to how Chain-of-Thought reasoning enables language models to "think longer" on complex problems, STHLM allows retrieval systems to "search wider" through iterative sampling. STHLM demonstrates critical improvements over classical vector retrieval across diverse benchmarks, including scientific literature, clinical notes, and tissue images, boosting retrieval performance by 10-30% through test-time compute (trading latency for accuracy), while enabling up to a 10-fold compression of embedding dimensions.
翻译:检索增强生成通过将大型语言模型的输出锚定于外部知识源,减少了幻觉问题并克服了知识截止限制。然而,基于标准嵌入的检索方法难以捕捉多概念查询的复杂性,在生物医学等数据天然具有高维特性的领域尤为明显。例如,组学数据集和临床报告同时呈现大量分子、细胞和生理学特征。本文提出随机潜在匹配(STHLM),这是一种生成式向量搜索方法,能够从文本或图像输入中采样查询条件化嵌入以提升检索性能。类比于思维链推理使语言模型能在复杂问题上“思考更久”,STHLM通过迭代采样使检索系统能够“搜索更广”。在涵盖科学文献、临床记录和组织图像的多类基准测试中,STHLM较传统向量检索展现出关键性改进:通过测试时计算(以延迟换取精度)将检索性能提升10-30%,同时实现嵌入维度高达10倍的压缩。