Recurrent transducer models have emerged as a promising solution for speech recognition on the current and next generation smart devices. The transducer models provide competitive accuracy within a reasonable memory footprint alleviating the memory capacity constraints in these devices. However, these models access parameters from off-chip memory for every input time step which adversely effects device battery life and limits their usability on low-power devices. We address transducer model's memory access concerns by optimizing their model architecture and designing novel recurrent cell designs. We demonstrate that i) model's energy cost is dominated by accessing model weights from off-chip memory, ii) transducer model architecture is pivotal in determining the number of accesses to off-chip memory and just model size is not a good proxy, iii) our transducer model optimizations and novel recurrent cell reduces off-chip memory accesses by 4.5x and model size by 2x with minimal accuracy impact.
翻译:经常性的传感器模型已成为当前和下一代智能装置语音识别的一个有希望的解决办法。 传感器模型在合理的记忆足迹中提供了具有竞争力的准确性,减轻了这些装置的记忆力限制。 但是,这些模型从芯外内存进入每个输入时间步骤的参数对电池寿命产生不利影响并限制其对低功率装置的可用性。 我们通过优化模型架构和设计新型的经常性电池设计来解决传感器模型的记忆存取问题。 我们证明, (一) 模型的能源成本主要在于从芯外内存获取模型重量, (二) 转换器模型结构在确定离芯内存的取次数方面起关键作用,而仅仅模型大小并不是一个好的替代物。 (三) 我们的传感器模型优化和新型的经常性细胞将离芯内存取量减少4.5x,模型大小减少2x,精确影响最小。