Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ.
翻译:产品定量化(PQ)是一种广泛使用的特殊检索技术。最近的研究提出了受监督的PQ,其中嵌入和量化模型可以与受监督的学习共同培训。然而,缺乏对联合培训目标的适当制定;因此,与以前未经监督的基准相比,在现实中改进有限。在这项工作中,我们提出了面向匹配的产品定量化(MOPQ),其中制定了一个新的目标“多诺尔对比损失 ” ( MMCL ) 。在最大程度减少 MCL 的情况下,我们能够最大限度地增加查询和地面验证键的匹配概率,这有助于优化检索的准确性。由于对大量对比样本的需求,我们进一步提出了不同的交叉抽样(DCS ), 以大大增强对准 MCL 精确近似值的对比样本。 我们对四个真实世界数据集进行了广泛的实验研究,其结果证实了MOQ 的有效性。