In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.
翻译:本文提出了一种方法,将原本用于开放领域的中等规模GPT模型在西班牙语的小封闭领域中对齐。该模型的应用是问题回答任务。为了实现这一目标,我们还需要训练并实现另一个神经网络(我们称之为奖励模型),该模型能够评分并确定一个回答是否适合给定的问题。该组件有助于改善系统的解码和答案生成。采用BLEU和困惑度等数值指标来评估模型,并使用人类判断比较了解码技术和其他技术。最终,结果支持所提出的方法,并且确定使用奖励模型对齐响应的生成是可行的。