BERT achieves remarkable results in text classification tasks, it is yet not fully exploited, since only the last layer is used as a representation output for downstream classifiers. The most recent studies on the nature of linguistic features learned by BERT, suggest that different layers focus on different kinds of linguistic features. We propose a CNN-Enhanced Transformer-Encoder model which is trained on top of fixed BERT $[CLS]$ representations from all layers, employing Convolutional Neural Networks to generate QKV feature maps inside the Transformer-Encoder, instead of linear projections of the input into the embedding space. CNN-Trans-Enc is relatively small as a downstream classifier and doesn't require any fine-tuning of BERT, as it ensures an optimal use of the $[CLS]$ representations from all layers, leveraging different linguistic features with more meaningful, and generalizable QKV representations of the input. Using BERT with CNN-Trans-Enc keeps $98.9\%$ and $94.8\%$ of current state-of-the-art performance on the IMDB and SST-5 datasets respectably, while obtaining new state-of-the-art on YELP-5 with $82.23$ ($8.9\%$ improvement), and on Amazon-Polarity with $0.98\%$ ($0.2\%$ improvement) (K-fold Cross Validation on a 1M sample subset from both datasets). On the AG news dataset CNN-Trans-Enc achieves $99.94\%$ of the current state-of-the-art, and achieves a new top performance with an average accuracy of $99.51\%$ on DBPedia-14. Index terms: Text Classification, Natural Language Processing, Convolutional Neural Networks, Transformers, BERT
翻译:BERT在文本分类任务中取得了显著成果,但尚未得到充分利用,因为只有最后一层是用作下游分类师的表示输出。最近对BERT所学语言特征性质进行的研究表明,不同层的重点是不同的语言特征。我们建议采用CNN-Enhanced变形器-Encoder模型,该模型在固定的 BERT [CLS]$代表的基础上得到培训,在所有层都使用革命神经网络在变换器-Encoder内部生成QKV特征地图,而不是将输入嵌入空间的直线预测。CNN-Trans-Enc作为下游分类师相对较小,不需要对BERT作任何微调,因为它确保了最佳使用来自所有层的 $[CLS] 的表示方式,以更有意义的和通用的 QKV表示。使用CNN-T-Transer-Enc 在变换器-Enational-Enc 内保持98.9%美元和9.QFRYST-O-O-Ral-Ral-Ral-al-al-al-alalal-alalalal-al-al-al-al-al-al-al-al-al-al-al-al-al-al-alalalalal-al-al-al-al-al-al-al-al-al-al-lational-al-lational-al-al-al-al-al-al-lational-al-al-al-al-al-al-al-lational-lational-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al