Multimodal sarcasm detection is challenging, especially in low-resource settings where subtle image-text contradictions are hard to learn due to scarce annotated data, which hinders the model's performance. Parameter-efficient fine-tuning (PEFT) methods like adapters, LoRA, and prompt tuning reduce overfitting but struggle to reach optimal performance due to limited supervision from few-shot data. We propose PEKD, a unified framework that enhances PEFT methods via distillation from an expert model trained on large-scale sarcasm data, which acts as the teacher. To mitigate unreliable signals from the teacher, we introduce an entropy-aware gating mechanism that dynamically adjusts the distillation strength based on teacher confidence. Experiments on two public datasets demonstrate that our PEKD framework enables PEFT methods to outperform both prior parameter-efficient approaches and large multimodal models, achieving strong results in the few-shot scenario. The framework is modular and adaptable to a wide range of multimodal models and tasks.
翻译:多模态讽刺检测具有挑战性,尤其在低资源场景下,由于标注数据稀缺,模型难以学习图像与文本之间微妙的矛盾关系,这阻碍了模型的性能表现。参数高效微调(PEFT)方法,如适配器、LoRA和提示调优,虽能减少过拟合,但因少样本数据提供的监督有限,难以达到最优性能。我们提出PEKD,一个统一的框架,通过从在大规模讽刺数据上训练的专家模型(作为教师)进行蒸馏,来增强PEFT方法。为减轻教师模型可能提供的不可靠信号,我们引入了一种基于熵感知的门控机制,该机制根据教师模型的置信度动态调整蒸馏强度。在两个公开数据集上的实验表明,我们的PEKD框架使PEFT方法在少样本场景下,不仅超越了先前的参数高效方法,也优于大型多模态模型,取得了强劲的结果。该框架具有模块化特性,可广泛适应于多种多模态模型与任务。