The Bidirectional Encoder Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks such as Text Classification and Named Entity Recognition (NER) applications. However, it is challenging to scale BERT for low-latency and high-throughput industrial use cases due to its enormous size. We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM). The model is capable of inferring an input in at most 0.2ms on CPU. BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task. We achieve this result in two phases. First, we create a pre-trained model, called eBERT, which is the original BERT architecture trained with our unique item title corpus. We then fine-tune eBERT for the QTR task. Second, we train the BertBiLSTM model to mimic the eBERT model's performance through a process called Knowledge Distillation (KD) and show the effect of data augmentation to achieve the resembling goal. Experimental results show that the proposed model outperforms other compact and production-ready models.
翻译:变换器(变换器)的双向编码显示模型极大地改善了许多自然语言处理(NLP)任务(如文本分类和名称实体识别(NNER)应用软件等)的绩效。然而,由于该模型的大小巨大,在低纬度和高通量工业使用案例中,很难对低流和高通量工业使用案例进行BERT的升级。我们成功地优化了用于通过一个紧凑模型部署的Query-Title相关性(QTRT)分类,我们将它命名为BERT双向长期短期内存(BertBILSTM) 。该模型能够推断在最多0.2米的 CPU 和命名实体识别(NNER) 应用程序中的投入。 BertBLSTM 在上述真实世界生产任务中,在准确性和效率方面超过了现成的BERT模型的性能。我们分两个阶段就取得了这一结果。首先,我们创建了一种预先培训的模型,称为eBERT,这是我们用独有的项目名的模型培训的原始模型。我们随后对QTRBERT任务进行微调的eBERT。第二,我们训练BERT模型,我们用BERT模型,我们训练了BT的模型,我们训练了BSTM模型,我们要BILSTM模型的BSBStor化了BS 以通过演示BAVAF的模型的模型,然后通过SAVAVAD 显示SBAFAVAVAVAVAVAD结果显示一个模拟的模型,以显示电子化数据效果。