This thesis develops signal-processing algorithms and implementation schemes under constraints of minimal parallelism and memory space, with the goal of improving energy efficiency of low-power computing hardware. We propose (i) a power/energy consumption model for clocked CMOS logic that supports selecting optimal parallelism, (ii) integer-friendly approximation methods for elementary functions that reduce lookup-table size via constrained piecewise-polynomial (quasi-spline) constructions with accuracy guarantees, (iii) provably conflict-free data placement and execution order for mixed-radix streaming FFT on multi-bank and single-port memories, including a self-sorting FFT variant, and (iv) a parallelism/memory analysis of the fast Schur algorithm for superfast Toeplitz system solving, motivated by echo-cancellation workloads. The results provide constructive theorems, schedules, and design trade-offs enabling efficient specialized accelerators.
翻译:本论文在最小并行度与存储空间约束下开发信号处理算法与实现方案,旨在提升低功耗计算硬件的能效。我们提出:(i)支持选择最优并行度的时钟CMOS逻辑功耗/能耗模型;(ii)通过具有精度保证的约束分段多项式(拟样条)构造来减少查找表尺寸的基本函数整数友好逼近方法;(iii)针对多存储体与单端口存储器的混合基流式FFT,提供可证明无冲突的数据布局与执行顺序方案,包含自排序FFT变体;(iv)基于回声消除工作负载的启发,对超快速Toeplitz系统求解的快速Schur算法进行并行度/存储空间分析。研究成果提供了构造性定理、调度方案与设计权衡,为实现高效专用加速器奠定基础。