In the era of large language models (LLMs), supervised neural methods remain the state-of-the-art (SOTA) for Coreference Resolution. Yet, their full potential is underexplored, particularly in incremental clustering, which faces the critical challenge of balancing efficiency with performance for long texts. To address the limitation, we propose \textbf{MEIC-DT}, a novel dual-threshold, memory-efficient incremental clustering approach based on a lightweight Transformer. MEIC-DT features a dual-threshold constraint mechanism designed to precisely control the Transformer's input scale within a predefined memory budget. This mechanism incorporates a Statistics-Aware Eviction Strategy (\textbf{SAES}), which utilizes distinct statistical profiles from the training and inference phases for intelligent cache management. Furthermore, we introduce an Internal Regularization Policy (\textbf{IRP}) that strategically condenses clusters by selecting the most representative mentions, thereby preserving semantic integrity. Extensive experiments on common benchmarks demonstrate that MEIC-DT achieves highly competitive coreference performance under stringent memory constraints.
翻译:在大语言模型(LLM)时代,有监督的神经方法仍然是核心指代消解任务的最先进技术。然而,其全部潜力尚未得到充分挖掘,尤其是在增量聚类方面,该技术在处理长文本时面临着平衡效率与性能的关键挑战。为解决这一局限,我们提出了 **MEIC-DT**,一种基于轻量级 Transformer 的新型双阈值、内存高效增量聚类方法。MEIC-DT 采用了一种双阈值约束机制,旨在预定义的内存预算内精确控制 Transformer 的输入规模。该机制包含一个统计感知的淘汰策略,该策略利用训练和推理阶段不同的统计特征进行智能缓存管理。此外,我们引入了一种内部正则化策略,该策略通过选择最具代表性的提及来策略性地压缩聚类簇,从而保持语义完整性。在通用基准测试上的大量实验表明,MEIC-DT 在严格的内存约束下实现了极具竞争力的指代消解性能。