Learning clients embeddings from sequences of their historic communications is central to financial applications. While large language models (LLMs) offer general world knowledge, their direct use on long event sequences is computationally expensive and impractical in real-world pipelines. In this paper, we propose LATTE, a contrastive learning framework that aligns raw event embeddings with semantic embeddings from frozen LLMs. Behavioral features are summarized into short prompts, embedded by the LLM, and used as supervision via contrastive loss. The proposed approach significantly reduces inference cost and input size compared to conventional processing of complete sequence by LLM. We experimentally show that our method outperforms state-of-the-art techniques for learning event sequence representations on real-world financial datasets while remaining deployable in latency-sensitive environments.
翻译:从客户历史通信序列中学习其嵌入表示是金融应用的核心任务。尽管大型语言模型(LLMs)具备通用世界知识,但直接将其应用于长事件序列在计算上成本高昂,在实际流水线中难以实施。本文提出LATTE——一种对比学习框架,将原始事件嵌入与冻结LLMs生成的语义嵌入进行对齐。该方法将行为特征归纳为简短提示,经LLM编码后通过对比损失作为监督信号。相较于传统LLM处理完整序列的方式,所提方案显著降低了推理成本与输入规模。实验表明,在真实金融数据集上,本方法在学习事件序列表示方面优于现有先进技术,同时能够部署在延迟敏感的环境中。