This technical report presents the training methodology and evaluation results of the open-source Jasper-Token-Compression-600M model, released in November 2025. Building on previous distillation-based recipes from the English Stella and Jasper models, we successfully extend this approach to a bilingual (English and Chinese) domain, further enhancing model performance through the incorporation of contrastive learning. A key innovation of our model is the introduction of a one-dimensional convolution-based token compression module. We dynamically adjust the compression rate during training, enabling the model to learn more robust and efficient compressed text representations. By combining knowledge distillation with token compression techniques, we achieve significant improvements in both embedding quality and inference efficiency. Our model performs with higher efficiency than a traditional 0.6B model while achieving performance comparable to that of an 8B model. For more information on the model release, visit: https://huggingface.co/infgrad/Jasper-Token-Compression-600M.
翻译:本技术报告介绍了于2025年11月发布的开源 Jasper-Token-Compression-600M 模型的训练方法及评估结果。基于先前英文 Stella 和 Jasper 模型的蒸馏方案,我们成功将这一方法扩展至双语(英语和中文)领域,并通过引入对比学习进一步提升了模型性能。我们模型的一个关键创新是引入了一个基于一维卷积的令牌压缩模块。我们在训练过程中动态调整压缩率,使模型能够学习到更鲁棒、更高效的压缩文本表示。通过将知识蒸馏与令牌压缩技术相结合,我们在嵌入质量和推理效率两方面均取得了显著提升。我们的模型运行效率高于传统的 0.6B 参数模型,同时达到了与 8B 参数模型相当的性能。有关模型发布的更多信息,请访问:https://huggingface.co/infgrad/Jasper-Token-Compression-600M。