In this paper we show how to process the NOTAM (Notice to Airmen) data of the field in civil aviation. The main research contents are as follows: 1.Data preprocessing: For the original data of the NOTAM, there is a mixture of Chinese and English, and the structure is poor. The original data is cleaned, the Chinese data and the English data are processed separately, word segmentation is completed, and stopping-words are removed. Using Glove word vector methods to represent the data for using a custom mapping vocabulary. 2.Decoupling features and classifiers: In order to improve the ability of the text classification model to recognize minority samples, the overall model training process is decoupled from the perspective of the algorithm as a whole, divided into two stages of feature learning and classifier learning. The weights of the feature learning stage and the classifier learning stage adopt different strategies to overcome the influence of the head data and tail data of the imbalanced data set on the classification model. Experiments have proved that the use of decoupling features and classifier methods based on the neural network classification model can complete text multi-classification tasks in the field of civil aviation, and at the same time can improve the recognition accuracy of the minority samples in the data set.
翻译:在本文中,我们展示了如何处理民航领域的NOAM(通知飞行员)数据。主要研究内容如下:1. 数据预处理:对于NOAM的原始数据,整个模型培训过程从算法的角度来看是分解的,分为特征学习和分类学的两个阶段。特征学习阶段和分类学阶段的权重采用不同的战略克服分类模型上不平衡数据组头部数据和尾部数据的影响。实验证明,在神经网络分类模型的基础上使用脱钩特征和分类方法可以完善多级化的文本,在民用航空和分类学领域确定少数群体数据的准确性。