Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search. Existing methods initialize the tree index by clustering the label space into a few mutually exclusive clusters based on pre-defined features and keep it fixed throughout the training procedure. This approach results in a sub-optimal indexing structure over the label space and limits the search performance to the quality of choices made during the initialization of the index. In this paper, we propose a novel method ELIAS which relaxes the tree-based index to a specialized weighted graph-based index which is learned end-to-end with the final task objective. More specifically, ELIAS models the discrete cluster-to-label assignments in the existing tree-based index as soft learnable parameters that are learned jointly with the rest of the ML model. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels. In particular, ELIAS can be up to 2.5% better at precision@1 and up to 4% better at recall@100 than existing XMC methods. A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS.
翻译:极端多标签分类( XMC) 是解决许多现实世界问题的流行框架, 需要从大量的潜在产出选择中准确预测。 处理大标签空间的流行方法是将标签安排成浅树基指数, 然后学习ML模型, 以便通过光束搜索有效搜索该指数。 现有方法将标签空间分组成几个基于预定义特性的相互排斥的集群, 并在整个培训程序中固定。 这种方法的结果是在标签空间上形成一个亚最佳的索引结构, 并将搜索性能限制在指数初始化期间的选择质量上。 在本文中, 我们提出一种新的方法 ELIAS 将基于树的索引设为浅树基指数, 并随后学习一个专门加权的图形化指数, 以便与最终任务目标一起学习端端到端。 更具体地, ELIAS将现有树基指数中的离散的集群到标签任务作为可软化的参数, 与 ML 模型的其余部分一起学习。 ELIAS 实现最先进的状态- 性能性性性性性性性性性性性性性性性能, 在多个IAS- IMA1 特定的EL 和特定的精确性电子- 级 级 级 级 级 级 级 级 级 和现有 级 级 级 级 级 级 级 级 级 级 级 级 级 级 至 级 级 级 级 级 级 级 级 级 至 级 级 级 级 级 级 级 级 级 级 至 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级 级