Tree-based models underpin many modern semantic search engines and recommender systems due to their sub-linear inference times. In industrial applications, these models operate at extreme scales, where every bit of performance is critical. Memory constraints at extreme scales also require that models be sparse, hence tree-based models are often back-ended by sparse matrix algebra routines. However, there are currently no sparse matrix techniques specifically designed for the sparsity structure one encounters in tree-based models for extreme multi-label ranking/classification (XMR/XMC) problems. To address this issue, we present the masked sparse chunk multiplication (MSCM) technique, a sparse matrix technique specifically tailored to XMR trees. MSCM is easy to implement, embarrassingly parallelizable, and offers a significant performance boost to any existing tree inference pipeline at no cost. We perform a comprehensive study of MSCM applied to several different sparse inference schemes and benchmark our methods on a general purpose extreme multi-label ranking framework. We observe that MSCM gives consistently dramatic speedups across both the online and batch inference settings, single- and multi-threaded settings, and on many different tree models and datasets. To demonstrate its utility in industrial applications, we apply MSCM to an enterprise-scale semantic product search problem with 100 million products and achieve sub-millisecond latency of 0.88 ms per query on a single thread -- an 8x reduction in latency over vanilla inference techniques. The MSCM technique requires absolutely no sacrifices to model accuracy as it gives exactly the same results as standard sparse matrix techniques. Therefore, we believe that MSCM will enable users of XMR trees to save a substantial amount of compute resources in their inference pipelines at very little cost.
翻译:以树为基础的模型是许多现代语义搜索引擎和建议系统的基础。 在工业应用中,这些模型在极端尺度上运行,每个性能都十分关键。 极端尺度的内存限制也要求模型稀疏,因此树基模型往往由稀薄的矩阵代数例行程序后退。 然而,目前没有为极端多标签等级/分类(XMR/XMC)的树基模型中遇到的极端多标签等级/分类(XMR/XMC)问题精确度问题而专门设计的稀疏的矩阵技术。为了解决这个问题,我们展示了隐藏的稀释体块倍增(MSCM)技术,这是一种特别适合XMRR树的稀释式矩阵技术。 MSCM很容易执行,令人尴尬地平行地平行使用,并且为任何现有的树变色管道提供了显著的性增强性功能。 我们对MSMM系统应用的方法进行了全面研究,在一般目的甚小的多标签等级框架(XMM)中,我们发现MSM在在线和分批的精度环境环境中持续快速快速加速超超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速超速, 。 单和精确地为XMMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm,在100mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm