Designing adequate and precise neural architectures is a challenging task, often done by highly specialized personnel. AutoML is a machine learning field that aims to generate good performing models in an automated way. Spectral data such as those obtained from biological analysis have generally a lot of important information, and these data are specifically well suited to Convolutional Neural Networks (CNN) due to their image-like shape. In this work we present NASirt, an AutoML methodology based on Neural Architecture Search (NAS) that finds high accuracy CNN architectures for spectral datasets. The proposed methodology relies on the Item Response Theory (IRT) for obtaining characteristics from an instance level, such as discrimination and difficulty, and it is able to define a rank of top performing submodels. Several experiments are performed in order to demonstrate the methodology's performance with different spectral datasets. Accuracy results are compared to other benchmarks methods, such as a high performing, manually crafted CNN and the Auto-Keras AutoML tool. The results show that our method performs, in most cases, better than the benchmarks, achieving average accuracy as high as 97.40%.
翻译:设计适当和精确的神经结构是一项艰巨的任务,通常由高度专业化的人员完成。自动ML是一个机器学习领域,目的是以自动化的方式生成良好的模型。从生物分析中获得的光谱数据一般具有许多重要信息,这些数据由于图像相似的形状而特别适合于进化神经网络。在这项工作中,我们介绍了以神经结构搜索(NASirt)为基础的自动ML方法,该方法为光谱数据集找到高精度CNN(CNN)结构。拟议方法依靠项目反应理论(IRT)从一个试级获得特征,例如歧视和困难,能够确定最高级性能子模型的等级。进行了一些实验,以展示方法在不同频谱数据集中的性能。精确性结果与其他基准方法相比,例如高性能、手动型CNN和自动Keras AutoML工具。结果显示,在大多数情况下,我们的方法比基准级要好,达到97-40%的平均精度。