In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.
翻译:在工业NLP应用程序中,我们手工标签的数据含有一定数量的噪音数据。我们提出了一个简单的方法来查找噪音数据并手工重新标签,同时我们收集更正信息。然后我们将新的方法将人类校正信息纳入深层学习模式。人类知道如何校正噪音数据。因此,校正信息可以输入深层学习模式。我们用人工标签在自己的文本分类数据集上做实验,因为我们将噪音数据重新贴在我们的行业应用程序的数据集中。实验结果显示,我们的方法提高了分类准确性,从91.7%提高到92.5%。91.7%的基线是基于对校正数据集的BERT培训,这很难超过。