Extreme multi-label classification (XMLC) is a learning task of tagging instances with a small subset of relevant labels chosen from an extremely large pool of possible labels. Problems of this scale can be efficiently handled by organizing labels as a tree, like in hierarchical softmax used for multi-class problems. In this paper, we thoroughly investigate probabilistic label trees (PLTs) which can be treated as a generalization of hierarchical softmax for multi-label problems. We first introduce the PLT model and discuss training and inference procedures and their computational costs. Next, we prove the consistency of PLTs for a wide spectrum of performance metrics. To this end, we upperbound their regret by a function of surrogate-loss regrets of node classifiers. Furthermore, we consider a problem of training PLTs in a fully online setting, without any prior knowledge of training instances, their features, or labels. In this case, both node classifiers and the tree structure are trained online. We prove a specific equivalence between the fully online algorithm and an algorithm with a tree structure given in advance. Finally, we discuss several implementations of PLTs and introduce a new one, napkinXC, which we empirically evaluate and compare with state-of-the-art algorithms.
翻译:极端多标签分类( XMLC) 是一个学习任务, 通过从极其庞大的可能标签库中选择的一小部分相关标签进行标记的学习任务。 这种规模的问题可以通过将标签组织成树来有效处理,例如用于多类问题的等级软麦片。 在本文中,我们彻底调查可被视为对多类问题进行等级软麦的概括性分类的概率性标签树(PLT) 。 我们首先引入 PLT 模型, 并讨论培训和推断程序及其计算成本。 其次, 我们证明各种性能衡量标准在PLT 和树结构之间的一致性。 为此,我们感到后悔的是, 一种代谢- 损失- 损失- 错失- 错失- 错分级者 的遗憾。 此外, 我们考虑在完全在线环境中对 PLT 进行训练的问题, 而不事先知道任何培训案例、 其特征或标签。 在这种情况下, 我们先行的分类和树结构都经过在线培训。 我们证明完全的在线算法与预定的树结构之间的具体等等等。 最后, 我们用一个测试了PLPLT 和C 的算法, 我们比较了几个 的进度, 我们比较了我们先验的进度, 。