Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels. While training a model once a label tree is available is well studied, designing the structure of the tree is a difficult task that is not yet well understood, and can dramatically impact both model latency and statistical performance. Existing approaches to tree construction fall at an extreme point, either optimizing exclusively for statistical performance, or for latency. We propose an efficient information theory inspired algorithm to construct intermediary operating points that trade off between the benefits of both. Our algorithm enables interpolation between these objectives, which was not previously possible. We corroborate our theoretical analysis with numerical results, showing that on the Wiki-500K benchmark dataset our method can reduce a proxy for expected latency by up to 28% while maintaining the same accuracy as Parabel. On several datasets derived from e-commerce customer logs, our modified label tree is able to improve this expected latency metric by up to 20% while maintaining the same accuracy. Finally, we discuss challenges in realizing these latency improvements in deployed models.
翻译:极端多标签分类(XMC) 旨在学习一种模型,用极大的标签组装的一组相关标签标记数据点。 真正的世界电子商务应用,如个性化建议和产品广告等,可以像XMC问题一样被设计成XMC问题,其目标是为用户预测从几百万产品目录中产生的一小部分物品。 对于这种应用,一个共同的方法是将这些标签组织成一棵树,使培训和推断时间在标签数量上具有对数的对数。 在对标签树的改进进行充分研究后,培训一个模型,但设计树的结构是一项困难的任务,目前还不能很好地理解,而且能够极大地影响模型的内衣和统计性。 现有的树构造方法在极端的某个点下降,要么只优化统计性能,要么为悬浮度。 我们提议一种有效的算法,用来构建中间操作点,在二者的效益之间进行交易。 我们的运算法使得在这两个目标之间发生对调,这以前是不可能的。 我们用数字分析来证实我们的理论分析,显示在维基- 500K树的结构结构结构上得出的精确性, 显示从28号基准值数据定位上, 将降低我们的预期的精确度,而我们又将逐渐地标定的精确度, 将使我们的精确度在20的精确度在精确度上进行。