Searching for the $k$-nearest neighbors (KNN) in multimodal data retrieval is computationally expensive, particularly due to the inherent difficulty in comparing similarity measures across different modalities. Recent advances in multimodal machine learning address this issue by mapping data into a shared embedding space; however, the high dimensionality of these embeddings (hundreds to thousands of dimensions) presents a challenge for time-sensitive vision applications. This work proposes Order-Preserving Dimension Reduction (OPDR), aiming to reduce the dimensionality of embeddings while preserving the ranking of KNN in the lower-dimensional space. One notable component of OPDR is a new measure function to quantify KNN quality as a global metric, based on which we derive a closed-form map between target dimensionality and key contextual parameters. We have integrated OPDR with multiple state-of-the-art dimension-reduction techniques, distance functions, and embedding models; experiments on a variety of multimodal datasets demonstrate that OPDR effectively retains recall high accuracy while significantly reducing computational costs.
翻译:在多模态数据检索中搜索$k$近邻(KNN)的计算成本高昂,这主要源于跨模态相似度度量的固有困难。近年来,多模态机器学习通过将数据映射到共享嵌入空间来解决这一问题;然而,这些嵌入的高维度(数百至数千维)对时间敏感的视觉应用构成了挑战。本文提出保持顺序的降维方法(OPDR),旨在降低嵌入维度的同时,在低维空间中保持KNN的排序不变。OPDR的一个显著组成部分是提出了一种新的度量函数,将KNN质量量化为全局指标,并基于此推导出目标维度与关键上下文参数之间的闭式映射关系。我们将OPDR与多种先进的降维技术、距离函数和嵌入模型相结合;在多个多模态数据集上的实验表明,OPDR在显著降低计算成本的同时,能有效保持较高的检索准确率。