Point clouds are increasingly important in intelligent applications, but frequent off-chip memory traffic in accelerators causes pipeline stalls and leads to high energy consumption. While conventional line buffer techniques can eliminate off-chip traffic, they cannot be directly applied to point clouds due to their inherent computation patterns. To address this, we introduce two techniques: compulsory splitting and deterministic termination, enabling fully-streaming processing. We further propose StreamGrid, a framework that integrates these techniques and automatically optimizes on-chip buffer sizes. Our evaluation shows StreamGrid reduces on-chip memory by 61.3\% and energy consumption by 40.5\% with marginal accuracy loss compared to the baselines without our techniques. Additionally, we achieve 10.0$\times$ speedup and 3.9$\times$ energy efficiency over state-of-the-art accelerators.
翻译:暂无翻译