Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.
翻译:产品定量化(PQ)是一种广泛使用的特殊检索技术。最近的研究建议有监督的 PQ, 嵌入和定量模型可以通过有监督的学习进行联合培训。然而,缺乏对联合培训目标的适当制定;因此,与以前未经监督的基准相比的改进在现实中是有限的。在这项工作中,我们提议了面向匹配的产品定量化(MOPQ),该产品量化(MOPQ)是一个新颖的目标多诺利对比损失(MMCL),随着最大程度的MSL,我们能够最大限度地增加查询和地面验证键的匹配概率,从而有助于优化检索的准确性。由于对大量对比样本的需求,我们进一步提议了可区别的跨点抽样(DCS),这大大加强了MCL精确近似值的对比样本。我们对四个真实世界数据集进行了广泛的实验研究,其结果证实了MOQ的有效性。代码可以在 https://github.com/micromas/MOQ上查阅。