增强AI:无标签数据和非正式预测的数据增加战略,采用电子鼻子替代草药歧视的案例 (Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose)

Conformer · 模型评估 · 数据增强 · Boosting（一种模型训练加速方式） · 判别器 ·

2021 年 2 月 5 日

Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose

翻译：增强AI:无标签数据和非正式预测的数据增加战略,采用电子鼻子替代草药歧视的案例

Li Liu,Xianghao Zhan,Rumeng Wu,Xiaoqing Guan,Zhan Wang,Wei Zhang,You Wang,Zhiyuan Luo,Guang Li

Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-ing nature, previous research relies on the labelled training data,which are time-costly and labor-intensive to collect. Consideringthe training data inadequacy in real-world applications, this studyaims to improve classification accuracy via data augmentationstrategies. We stimulated two scenarios to investigate the effective-ness of five data augmentation strategies under different trainingdata inadequacy: in the noise-free scenario, different availability ofunlabelled data were simulated, and in the noisy scenario, differentlevels of Gaussian noises and translational shifts were added tosimulate sensor drifts. The augmentation strategies: noise-addingdata augmentation, semi-supervised learning, classifier-based online learning, inductive conformal prediction (ICP) onlinelearning and the novel ensemble ICP online learning proposed in this study, were compared against supervised learningbaseline, with Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) as the classifiers. We found thatat least one strategies significantly improved the classification accuracy with LDA(p<=0.05) and showed non-decreasingclassification accuracy with SVM in each tasks. Moreover, our novel strategy: ensemble ICP online learning outperformedthe others by showing non-decreasing classification accuracy on all tasks and significant improvement on most tasks(25/36 tasks,p<=0.05). This study provides a systematic analysis over augmentation strategies, and we provided userswith recommended strategies under specific circumstances. Furthermore, our newly proposed strategy showed botheffectiveness and robustness in boosting the classification model generalizability, which can also be further employed inother machine learning applications.

翻译：电子鼻子证明了其在替代草药分类中的有效性,但是由于监督的学习性质,先前的研究依赖于有标签的培训数据,这些数据需要花费时间和劳动密集型才能收集。考虑到实际应用中的培训数据不足,本研究旨在通过数据增强战略提高分类准确性。我们刺激了两种情景,以调查不同培训数据不足下五项数据增强战略的有效性:在无噪音假设中,模拟了不同程度的未贴标签数据,在吵闹的假设中,不同级别的高萨噪音和翻译转换都添加了模拟传感器漂移。增强战略:增加噪音数据增强、半超级学习、基于分类的在线学习、感化符合预测(IPC)在线学习和本研究中提议的新型综合国际比较方案在线学习,与受监督的学习基线进行了比较,以线性差异模型分析(LDA)和支持Vctor Machy (SVM) 进一步展示了分类中的升级,我们发现至少一项战略大大改进了在SDA(P)A(O) 0.05中采用的非升级战略的分类准确性, 并且显示SDA(SLA(ILA) 0.05)中的大多数学习任务中采用的非高级任务。