Decoder transformers have continued increasing in scale reaching hundreds of billions of parameters. Due to their scale the same decoder sets state-of-the-art results on various language tasks via prompting or fine-tuning. Yet, these large foundation models remain unusable for the related fields of semantic search and sentence embeddings. This prevents possibly new state-of-the-art results and forces organizations to train and maintain separate models. To this end, we propose SGPT to use decoders for sentence embeddings and semantic search via prompting or fine-tuning. At 5.8 billion parameters SGPT improves on the previously best sentence embeddings by a margin of 7% and outperforms a concurrent method with 175 billion parameters as measured on the BEIR search benchmark. Code, models and result files are freely available at https://github.com/Muennighoff/sgpt.
翻译:解码器变异器在规模上继续扩大,达到数千亿个参数。 由于其规模, 同样的解码器通过促进或微调, 设定了各种语言任务的最新最新结果。 然而, 这些大型基础模型仍然无法用于语义搜索和判决嵌入的相关领域。 这有可能阻止新的最新结果, 迫使各组织培训和保持不同的模型。 为此, 我们提议SGPT 使用解码器通过提示或微调来嵌入句内和语义搜索。 在58亿个参数上, SGPT 改进了先前最好的句子嵌入幅度为7%的边距, 并且比BEIR搜索基准衡量的1, 750亿个参数的并行方法要差。 可在 https://github.com/ Muennighoff/sgpt 上免费查阅代码、 模型和结果文档。