In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.
翻译:摘要:本文提出了一种方法,将最初针对开放领域的中型英文GPT模型对齐到小型封闭西班牙语领域。该模型的应用为问答任务。为了达到这一目的,我们还需要训练和实现另一个神经网络(称为奖励模型),以对答案进行评分并确定答案是否适用于给定问题。此组件有助于改善系统答案的解码和生成。使用BLEU和困惑度等数字指标评估模型,同时使用人类判断比较解码技术和其他技术。最终结果支持所提出的方法,认为使用奖励模型对齐生成的回答是可行的。