通过精密度模型图案,通过精确度模型图案,最大限度地实现使用案例特性的最大化 (Maximizing Use-Case Specificity through Precision Model Tuning)

Language models have become increasingly popular in recent years for tasks like information retrieval. As use-cases become oriented toward specific domains, fine-tuning becomes default for standard performance. To fine-tune these models for specific tasks and datasets, it is necessary to carefully tune the model's hyperparameters and training techniques. In this paper, we present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. The models we consider are DeepMind's RETRO (7B parameters), GPT-J (6B parameters), GPT-3 (175B parameters), and BLOOM (176B parameters). We compare their performance on the basis of relevance, accuracy, and interpretability, using a large corpus of 480000 research papers on protein structure/function prediction as our dataset. Our findings suggest that smaller models, with <10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions in terms of accuracy, relevancy, and interpretability by a significant margin (+50% on average). However, larger models do provide generally better results on broader prompts.

翻译：近年来,对于信息检索等任务,语言模型越来越受欢迎。随着使用情况面向特定领域,微调就成为标准性能的默认。为微调这些模型对具体任务和数据集进行微调,有必要仔细调整模型的超参数和培训技术。在本文中,我们深入分析了四种基于变压器的语言模型在生物医学信息检索任务方面的性能。我们认为,这些模型是DeepMind的RETRO(7B参数)、GPT-J(6B参数)、GPT-3(175B参数)和BLOOM(176B参数),我们根据相关性、准确性和可解释性来比较这些模型的性能,我们用480 000份关于蛋白质结构/功能预测的大型研究论文作为数据集。我们的研究结果表明,规模较小的模型,加上<10B参数,并精细调了具体领域的数据集,往往在精确性、相关性和可解释性方面超越较大的语言模型(平均为+50%)。但是,较大的模型确实提供了更为广泛的及时的结果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/