In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we re-label the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% accuracy is trained on the corrected dataset, which improve the baseline from 83.3% to 91.7%.
翻译:在工业NLP应用程序中,我们手工标签的数据含有一定数量的噪音数据。我们提出了一个简单的方法来查找噪音数据并手工重新标签,同时我们收集更正信息。然后我们将新的方法将人类校正信息纳入深层学习模式。人类知道如何校正噪音数据。因此,校正信息可以输入深层学习模式。我们用手工标签在自己的文本分类数据集上做实验,因为我们在工业应用程序的数据集中重新标签了噪音数据。实验结果显示,我们的方法提高了分类精确度,从91.7%提高到92.5%。91.7%的精确度是在校正数据集上培训的,该数据集将基线从83.3%提高到91.7%。