We demonstrate that 1x1-convolutions in 1D time-channel separable convolutions may be replaced by constant, sparse random ternary matrices with weights in $\{-1,0,+1\}$. Such layers do not perform any multiplications and do not require training. Moreover, the matrices may be generated on the chip during computation and therefore do not require any memory access. With the same parameter budget, we can afford deeper and more expressive models, improving the Pareto frontiers of existing models on several tasks. For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from $97.21\%$ to $97.41\%$ at the same network size. Alternatively, we can lower the cost of existing models. For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about $1\%$ of the floating-point baseline's word error rate.
翻译:我们证明, 1D 时间通道可分离的变异中, 1x1 的变异可以由固定的、稀疏的随机的、重量为 $1,0,+1 $1 $的循环矩阵所取代。 这些层不执行任何乘法,也不需要培训。 此外, 计算过程中可以在芯片上生成矩阵, 因此不需要任何内存访问 。 在同一参数预算下, 我们可以买得起更深、更清晰的模型, 改善现有模型在数项任务上的Pareto边界 。 在 Google 语音指令 v1 的指令识别中, 我们用同样的网络大小来提高最先进的精确度, 从 97.21 $ $ 到 97.41 $ $ 。 或者, 我们可以降低现有模型的成本 。 关于 Librispeech 的语音识别, 我们只牺牲浮点基线单词错误率的大约1 $ 。