Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods that allow them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper identifies that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates.
翻译:极端分类 (XC) 试图用极大标签组群中最相关的标签组别标记数据点。 深 XC 进行密度大、 学习过的数据点和标签演示已经引起人们的极大关注, 因为它优于早期的 XC 方法, 使用稀少的手工制作特征。 消极的采矿技术已成为所有深度XC 方法的关键组成部分, 使得它们能够向数百万个标签推广。 然而, 尽管最近取得了进步, 培训具有大型编码结构( 如变压器)的深度 XC 模型仍然具有挑战性。 本文指出, 流行负面采矿技术的记忆管理往往迫使微型批量大小的小型和缓慢的训练。 作为回应, 本文引入了一种轻量的微型批量创建技术, 能够提供精准的批量负面样本。 这使得使用大型小型的XC 方法进行的培训, 能够大大加快趋同速度, 和更高的理解度比现有的负面取样技术要快得多。 发现, NGAME 在一系列用于极端分类的基准数据集中, 的记忆间接数据组积往往会迫使保持小小和缓慢的训练。 作为3 % 用户在网络引擎上进行搜索。