Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.
翻译:数字理论变换(NTT)是在有希望的基于 lattice 的加密中计算多式倍增的基本数学工具。 但是,昂贵的分裂操作和复杂的数据依赖使得高效和灵活的硬件设计具有挑战性,特别是在资源紧缺的边缘装置上。 现有的方法要么只关注有限的参数设置,要么强加大量的硬件间接费用。 在本文中,我们引入了硬件算法,以便利用内存计算在各种环境中有效加速NTT。 通过利用优化的比特平行模块倍增和引入无成本的转移操作,我们提议的解决方案提供了多达29x更高的吞吐量(每个区域)和2.8-100x更好的吞吐量(每个区域),而不是最先进的技术。</s>