Bhasha-Rupantarika：面向多语言神经机器翻译的算法-硬件协同设计方法 (Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation)

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.

翻译：本文介绍了Bhasha-Rupantarika，一种通过算法-硬件协同设计、专为资源受限环境定制的轻量高效多语言翻译系统。该方法研究了亚字节精度级别（FP8、INT8、INT4和FP4）下的模型部署，实验结果表明模型尺寸（FP4）减少了4.1倍，推理速度提升了4.2倍，对应吞吐量提升至66 tokens/s（提升4.8倍）。这凸显了在基于FPGA加速器的物联网设备中，采用超低精度量化技术对于实时部署的重要性，其性能达到了预期水平。我们的评估涵盖了印度语言与国际语言之间的双向翻译，展示了其在低资源语言环境下的适应性。FPGA部署实现了查找表（LUT）数量减少1.96倍、触发器（FF）数量减少1.65倍，与OPU相比吞吐量提升2.2倍，与HPTA相比提升4.6倍。总体而言，该评估提供了一种基于量化感知翻译的可行方案，兼具硬件效率，适用于可部署的多语言人工智能系统。完整的代码库[https://github.com/mukullokhande99/Bhasha-Rupantarika/]及可复现数据集均已公开，便于研究人员快速集成与进一步开发。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日