There has been a recent surge in the field of Explainable AI (XAI) which tackles the problem of providing insights into the behavior of black-box machine learning models. Within this field, \textit{feature attribution} encompasses methods which assign relevance scores to input features and visualize them as a heatmap. Designing flexible accelerators for multiple such algorithms is challenging since the hardware mapping of these algorithms has not been studied yet. In this work, we first analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference. The gradient computation is optimized to minimize the memory overhead. Second, we develop a High-Level Synthesis (HLS) based configurable FPGA design that is targeted for edge devices and supports three feature attribution algorithms. Tile based computation is employed to maximally use on-chip resources while adhering to the resource constraints. Representative CNNs are trained on CIFAR-10 dataset and implemented on multiple Xilinx FPGAs using 16-bit fixed-point precision demonstrating flexibility of our library. Finally, through efficient reuse of allocated hardware resources, our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.
翻译:解析 AI (XAI) 领域最近出现了快速增长, 解决了对黑盒机器学习模式的行为提供洞察力的问题。 在这个领域, textit{ faterresult} 包含对输入特性进行相关评分的方法, 并将之作为热映射。 由于尚未研究这些算法的硬件映射, 设计多种此类算法的灵活加速器具有挑战性。 在这项工作中, 我们首先分析基于特性特性属性的梯度回映算法的数据流, 以确定所需要的资源间接费用。 梯度计算是优化的, 以尽量减少记忆管理。 第二, 我们开发了一个基于可配置的 FPGA 设计的高级合成(HLS ), 以边缘设备为对象, 支持三种特性归属算法 。 以轨迹为基础的计算方法用于在芯片资源上最大程度地使用资源, 同时遵守资源限制。 CNNNCS代表接受CIFAR- 10 数据设置培训, 并使用多个 Xilinx FPGAs 数据, 使用16 固定的精确度来显示图书馆的灵活性 。 最后, 显示我们图书馆的灵活性。 通过高效的硬度支持,, 将配置数据转换为驱动器, 演示 X.