Understanding the motivations underlying scholarly citations is critical for evaluating research impact and fostering transparent scholarly communication. This study introduces CiteFusion, an ensemble framework designed to address the multiclass Citation Intent Classification (CIC) task on benchmark datasets, SciCite and ACL-ARC. The framework decomposes the task into binary classification subtasks, utilizing complementary pairs of SciBERT and XLNet models fine-tuned independently for each citation intent. These base models are aggregated through a feedforward neural network meta-classifier, ensuring robust performance in imbalanced and data-scarce scenarios. To enhance interpretability, SHAP (SHapley Additive exPlanations) is employed to analyze token-level contributions and interactions among base models, providing transparency into classification dynamics. We further investigate the semantic role of structural context by incorporating section titles into input sentences, demonstrating their significant impact on classification accuracy and model reliability. Experimental results show that CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite and 76.24% on ACL-ARC. The original intents from both datasets are mapped to Citation Typing Ontology (CiTO) object properties to ensure interoperability and reusability. This mapping highlights overlaps between the two datasets labels, enhancing their understandability and reusability. Finally, we release a web-based application that classifies citation intents leveraging CiteFusion models developed on SciCite.
翻译:暂无翻译