Deep learning based superresolution achieves high-quality results, but its heavy computational workload, large buffer, and high external memory bandwidth inhibit its usage in mobile devices. To solve the above issues, this paper proposes a real-time hardware accelerator with the tilted layer fusion method that reduces the external DRAM bandwidth by 92\% and just needs 102KB on-chip memory. The design implemented with a 40nm CMOS process achieves 1920x1080@60fps throughput with 544.3K gate count when running at 600MHz; it has higher throughput and lower area cost than previous designs.
翻译:深层学习超分辨率可以取得高质量的结果,但其沉重的计算工作量、大型缓冲和高外部内存带宽抑制了移动设备的使用。为了解决上述问题,本文件建议使用一个实时硬件加速器,采用倾斜层聚变法,将外部DRAM带宽减少92 ⁇,只需在芯片上存储102KB即可。在40nm CMOS程序下实施的设计在运行600MHz时可以达到1920x1080@60fps,在54.3K门点数时可以达到544.3K门点数;其吞吐量高于以前的设计,面积成本也低于以前的设计。