The Viterbi algorithm is a key operator for structured sequence inference in modern data systems, with applications in trajectory analysis, online recommendation, and speech recognition. As these workloads increasingly migrate to resource-constrained edge platforms, standard Viterbi decoding remains memory-intensive and computationally inflexible. Existing methods typically trade decoding time for space efficiency, but often incur significant runtime overhead and lack adaptability to various system constraints. This paper presents FLASH Viterbi, a Fast, Lightweight, Adaptive, and Hardware-Friendly Viterbi decoding operator that enhances adaptability and resource efficiency. FLASH Viterbi combines a non-recursive divide-and-conquer strategy with pruning and parallelization techniques to enhance both time and memory efficiency, making it well-suited for resource-constrained data systems.To further decouple space complexity from the hidden state space size, we present FLASH-BS Viterbi, a dynamic beam search variant built on a memory-efficient data structure. Both proposed algorithms exhibit strong adaptivity to diverse deployment scenarios by dynamically tuning internal parameters.To ensure practical deployment on edge devices, we also develop FPGA-based hardware accelerators for both algorithms, demonstrating high throughput and low resource usage. Extensive experiments show that our algorithms consistently outperform existing baselines in both decoding time and memory efficiency, while preserving adaptability and hardware-friendly characteristics essential for modern data systems. All codes are publicly available at https://github.com/Dzh-16/FLASH-Viterbi.
翻译:维特比算法是现代数据系统中用于结构化序列推断的关键算子,广泛应用于轨迹分析、在线推荐和语音识别等领域。随着这些工作负载日益向资源受限的边缘平台迁移,标准的维特比解码依然存在内存密集和计算灵活性不足的问题。现有方法通常以解码时间换取空间效率,但往往带来显著的运行时开销,且缺乏对不同系统约束的自适应性。本文提出FLASH Viterbi,一种快速、轻量、自适应且硬件友好的维特比解码算子,旨在提升算法的适应性与资源效率。FLASH Viterbi结合了非递归的分治策略与剪枝及并行化技术,从而同时提高了时间和内存效率,使其特别适用于资源受限的数据系统。为了进一步将空间复杂度与隐状态空间大小解耦,我们提出了FLASH-BS Viterbi,这是一种基于内存高效数据结构的动态束搜索变体。所提出的两种算法通过动态调整内部参数,展现出对多样化部署场景的强适应性。为了确保在边缘设备上的实际部署,我们还为两种算法开发了基于FPGA的硬件加速器,实现了高吞吐量和低资源占用。大量实验表明,我们的算法在解码时间和内存效率上均持续优于现有基线方法,同时保持了现代数据系统所必需的适应性和硬件友好特性。所有代码已在 https://github.com/Dzh-16/FLASH-Viterbi 公开。