边缘环境无害分类的小型变压器 (Tiny Transformers for Environmental Sound Classification at the Edge)

With the growth of the Internet of Things and the rise of Big Data, data processing and machine learning applications are being moved to cheap and low size, weight, and power (SWaP) devices at the edge, often in the form of mobile phones, embedded systems, or microcontrollers. The field of Cyber-Physical Measurements and Signature Intelligence (MASINT) makes use of these devices to analyze and exploit data in ways not otherwise possible, which results in increased data quality, increased security, and decreased bandwidth. However, methods to train and deploy models at the edge are limited, and models with sufficient accuracy are often too large for the edge device. Therefore, there is a clear need for techniques to create efficient AI/ML at the edge. This work presents training techniques for audio models in the field of environmental sound classification at the edge. Specifically, we design and train Transformers to classify office sounds in audio clips. Results show that a BERT-based Transformer, trained on Mel spectrograms, can outperform a CNN using 99.85% fewer parameters. To achieve this result, we first tested several audio feature extraction techniques designed for Transformers, using ESC-50 for evaluation, along with various augmentations. Our final model outperforms the state-of-the-art MFCC-based CNN on the office sounds dataset, using just over 6,000 parameters -- small enough to run on a microcontroller.

翻译：随着物联网的发展和大数据的兴起,数据处理和机器学习应用程序正在向边缘的廉价和低尺寸、重量和功率(SWaP)设备转移,这些设备往往以移动电话、嵌入系统或微控制器的形式出现。网络物理计量和签名情报(MASINT)领域利用这些设备来分析和利用数据,结果提高了数据质量,提高了安全性,降低了带宽。然而,在边缘培训和部署模型的方法有限,对于边缘设备来说,足够精确的模型往往过于庞大。因此,显然需要各种技术在边缘创造高效的AI/ML(SWP)设备。这项工作为边缘的环境声音分类领域的音频模型提供了培训技术。具体地说,我们设计和培训变压器对音频剪中的办公室声音进行分类。结果显示,以BERT为基础的变压器在Mel光谱仪上受过培训,可以比CNN的参数少99.85%。为了达到这一结果,我们首先测试了为变压器设计的几部音频模型提取技术,并使用ESC-FC运行的足够微控制器,用ESC-FS-FRS-FS-S-S-S-S-S-S-SD-S-S-S-S-S-S-R-S-S-S-SERM-SER-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-