Although transformer-based models have shown strong performance in word- and sentence-level tasks, effectively representing long documents, especially in fields like law and medicine, remains difficult. Sparse attention mechanisms can handle longer inputs, but are resource-intensive and often fail to capture full-document context. Hierarchical transformer models offer better efficiency but do not clearly explain how they relate different sections of a document. In contrast, humans often skim texts, focusing on important sections to understand the overall message. Drawing from this human strategy, we introduce a new self-supervised contrastive learning framework that enhances long document representation. Our method randomly masks a section of the document and uses a natural language inference (NLI)-based contrastive objective to align it with relevant parts while distancing it from unrelated ones. This mimics how humans synthesize information, resulting in representations that are both richer and more computationally efficient. Experiments on legal and biomedical texts confirm significant gains in both accuracy and efficiency.
翻译:尽管基于Transformer的模型在词级和句级任务中表现出色,但有效表征长文档(尤其在法律和医学等领域)仍然具有挑战性。稀疏注意力机制虽能处理更长输入,但资源消耗大且常难以捕获全文语境。分层Transformer模型提供了更好的效率,但未能清晰阐释文档不同部分间的关联机制。相比之下,人类常通过略读文本、聚焦关键段落来把握整体主旨。受此人类认知策略启发,我们提出一种新型自监督对比学习框架以增强长文档表征。该方法随机掩蔽文档的某个段落,并基于自然语言推理(NLI)的对比学习目标,使该段落与相关部分对齐,同时与无关部分分离。这种机制模拟了人类整合信息的方式,最终生成既语义丰富又计算高效的表征。在法律与生物医学文本上的实验验证了该方法在准确性与效率上的显著提升。