With the emergence of Artificial Intelligence (AI), new attention has been given to implement AI algorithms on resource constrained tiny devices to expand the application domain of IoT. Multimodal Learning has recently become very popular with the classification task due to its impressive performance for both image and audio event classification. This paper presents TinyM$^2$Net -- a flexible system algorithm co-designed multimodal learning framework for resource constrained tiny devices. The framework was designed to be evaluated on two different case-studies: COVID-19 detection from multimodal audio recordings and battle field object detection from multimodal images and audios. In order to compress the model to implement on tiny devices, substantial network architecture optimization and mixed precision quantization were performed (mixed 8-bit and 4-bit). TinyM$^2$Net shows that even a tiny multimodal learning model can improve the classification performance than that of any unimodal frameworks. The most compressed TinyM$^2$Net achieves 88.4% COVID-19 detection accuracy (14.5% improvement from unimodal base model) and 96.8% battle field object detection accuracy (3.9% improvement from unimodal base model). Finally, we test our TinyM$^2$Net models on a Raspberry Pi 4 to see how they perform when deployed to a resource constrained tiny device.
翻译:随着人工智能(AI)的出现,对资源有限的微小装置实施人工智能算法,以扩大IoT的应用领域。由于图像和音频事件分类的性能令人印象深刻,多式学习最近随着分类任务而变得非常受欢迎。本文展示了小M$2$Net -- -- 一种灵活系统算法,为资源有限的微小装置共同设计的多式学习框架。这个框架的设计是根据两个不同的案例研究进行的:从多式联运录音和从多式联运图像和音频探测战地物体中探测的COVID-19 COVID-19 探测器。为了压缩小装置实施模型,已经进行了实质性网络结构优化和混合精度量化。TinyyM$2Net显示,即使是一个小型的多式联运学习模型也能改善分类工作绩效,而不是任何单一模式。最压缩的TiniyM$2Net能够达到88.4%的COVID-19探测精度(从单式基本模型改进了14.5%)和96.8%的战地物体检测费用(从1兆贝里基模型改进了39%的R%,然后我们测试了4-BAR基模型。