The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on how sensitive their parameters are. We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. We test the fine-tuning performance based on freezing transformer layers, batch size, and learning rate. We find the parameters of BERT are hypersensitive to stochasticity in fine-tuning and that GPT-2 is more stable in such practice. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that should be maintained.
翻译:新颖的NLP字嵌入和深层学习技术的爆炸使得潜在的应用付出了巨大的努力,其中一个方向是在金融部门。虽然在GPT和BERT等最先进的模型上做了大量工作,但关于这些方法在经过预先培训后通过微调取得何种效果的工作相对较少,以及关于其参数的敏感程度的信息也相对较少。我们调查了预先经过培训的GPT-2和BERT模型中转移的神经结构的性能和敏感性。我们根据冻结变压器层、批量尺寸和学习率测试了微调性能。我们发现BERT的参数在微调中对于随机性非常敏感,而且GPT-2和BERT-2的参数在此类做法中比较稳定。同样清楚的是,早先的GPT-2和BERT的层含有应当保持的基本文字模式信息。