Due to the boom in technical compute in the last few years, the world has seen massive advances in artificially intelligent systems solving diverse real-world problems. But a major roadblock in the ubiquitous acceptance of these models is their enormous computational complexity and memory footprint. Hence efficient architectures and training techniques are required for deployment on extremely low resource inference endpoints. This paper proposes an architecture for detection of alphabets in American Sign Language on an ARM Cortex-M7 microcontroller having just 496 KB of framebuffer RAM. Leveraging parameter quantization is a common technique that might cause varying drops in test accuracy. This paper proposes using interpolation as augmentation amongst other techniques as an efficient method of reducing this drop, which also helps the model generalize well to previously unseen noisy data. The proposed model is about 185 KB post-quantization and inference speed is 20 frames per second.
翻译:由于技术计算在过去几年中迅速发展,世界在人工智能系统解决各种现实世界问题方面取得了巨大进步。但是,这些模型普遍接受的一个主要障碍是其巨大的计算复杂性和记忆足迹。因此,在极低的资源推断端点上部署需要高效的架构和培训技术。本文件建议建立一个在ARM Cortex-M7微控制器上检测美国手语字母的结构,该结构只有496 KB的框架缓冲 RAM。利用参数量化是一种常见技术,可能造成不同测试精度的下降。本文提议使用内推法作为其他技术之间的增强法,作为减少这种下降的有效方法,这也有助于模型普遍化以前看不见的噪音数据。拟议模型约有185 KB 后量化和推断速度为每秒20个框架。