State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant
翻译:当前最先进的极端多标签文本分类模型依赖多标签注意力机制来聚焦输入文本中的关键标记,但学习有效的注意力权重具有挑战性。我们提出了PLANT——预训练与杠杆化注意力——一种用于初始化注意力的即插即用策略。PLANT通过利用互信息增益指导的预训练排序学习模型,植入特定于标签的注意力。这种与架构无关的方法可无缝集成到大型语言模型骨干中,例如Mistral-7B、LLaMA3-8B、DeepSeek-V3和Phi-3。PLANT在包括ICD编码、法律主题分类和内容推荐在内的多项任务中超越了现有最先进方法。其增益在少样本设置下尤为显著,在稀有标签上实现了大幅改进。消融研究证实,注意力初始化是这些增益的关键驱动因素。代码和训练模型请访问:https://github.com/debjyotiSRoy/xcube/tree/plant