IoT devices suffer from resource limitations, such as processor, RAM, and disc storage. These limitations become more evident when handling demanding applications, such as deep learning, well-known for their heavy computational requirements. A case in point is robot pose estimation, an application that predicts the critical points of the desired image object. One way to mitigate processing and storage problems is compressing that deep learning application. This paper proposes a new CNN for the pose estimation while applying the compression techniques of pruning and quantization to reduce his demands and improve the response time. While the pruning process reduces the total number of parameters required for inference, quantization decreases the precision of the floating-point. We run the approach using a pose estimation task for a robotic arm and compare the results in a high-end device and a constrained device. As metrics, we consider the number of Floating-point Operations Per Second(FLOPS), the total of mathematical computations, the calculation of parameters, the inference time, and the number of video frames processed per second. In addition, we undertake a qualitative evaluation where we compare the output image predicted for each pruned network with the corresponding original one. We reduce the originally proposed network to a 70% pruning rate, implying an 88.86% reduction in parameters, 94.45% reduction in FLOPS, and for the disc storage, we reduced the requirement in 70% while increasing error by a mere $1\%$. With regard input image processing, this metric increases from 11.71 FPS to 41.9 FPS for the Desktop case. When using the constrained device, image processing augmented from 2.86 FPS to 10.04 FPS. The higher processing rate of image frames achieved by the proposed approach allows a much shorter response time.
翻译:IOT 设备受到资源限制, 如处理器、 RAM 和 盘片存储 。 当处理要求性应用程序时, 这些限制变得更加明显, 例如深层次学习, 以其繁琐的计算要求而著称。 一个典型的例子就是机器人显示估计, 一个预测期望图像对象的关键点的应用程序 。 一个缓解处理和存储问题的方法是压缩深层学习应用程序 。 本文提出一个新的CNN 用于显示估计, 同时运用压缩缩压和定量技术来减少其需求, 并改进响应时间 。 虽然 调整过程减少了推断所需的参数总数, 定量化降低了浮动点的精确度。 我们用一个更高级的机器人臂的配置估计任务, 比较高端点的图像和高端装置的结果 。 作为衡量标准, 我们考虑浮点点操作PER PER (FL OPS) 的次数, 数学计算总数, 参数的计算, 缩短时间, 延迟度时间, 和视频框架的处理时间 。 此外, 我们用质量评估, 将最小的图像比值比值比值比值增加18 %的 FPS 递减 FPS 。