We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $\mu$s and energy consumption as low as 30 $\mu$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools.
翻译:我们介绍了实地可编程门阵列平台的发展经验和最新结果,我们采用了开放源码 hls4ml 和 FINN 工作流程,目的是在FPGA 上实现优化神经网络的AIT硬件代码的民主化,我们介绍了关键词识别、异常现象探测和图像分类基准任务的设计和执行过程,由此产生的硬件实施是量化的、可配置的、空间数据流结构,为速度和效率量身定制,并采用了作为这项工作一部分开发的新的通用优化和通用工作流程。全部工作流程来自对FPGA 的量化-aware培训。解决方案部署在系统对芯片(Pynq-Z2)和纯的FPGA(Arty A7-100T)平台上。因此提交的信息使延迟率低至20美元,能源消耗低至30美元。我们展示了正在形成的关于可获取的硬件平台的ML基准如何促进无障碍合作和新技术的发展。