Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely limited resources, like implantable cardioverter defibrillator (ICD), which is known as TinyML. Unlike ML on the edge, TinyML with a limited energy supply has higher demands on low-power execution. Stochastic computing (SC) using bitstreams for data representation is promising for TinyML since it can perform the fundamental ML operations using simple logical gates, instead of the complicated binary adder and multiplier. However, SC commonly suffers from low accuracy for ML tasks due to low data precision and inaccuracy of arithmetic units. Increasing the length of the bitstream in the existing works can mitigate the precision issue but incur higher latency. In this work, we propose a novel SC architecture, namely Block-based Stochastic Computing (BSC). BSC divides inputs into blocks, such that the latency can be reduced by exploiting high data parallelism. Moreover, optimized arithmetic units and output revision (OUR) scheme are proposed to improve accuracy. On top of it, a global optimization approach is devised to determine the number of blocks, which can make a better latency-power trade-off. Experimental results show that BSC can outperform the existing designs in achieving over 10% higher accuracy on ML tasks and over 6 times power reduction.
翻译:随着AI民主化的进展,机器学习(ML)被成功地应用于边缘应用,例如智能手机和自动驾驶。如今,更多的应用要求微小的、资源极其有限的设备(如称为TinyML的植入式硬化硬化器(ICD))使用ML。与边缘的ML不同的是,能源供应有限的TinyML对低功率执行的要求较高。使用比特流进行数据代表的Stochanical计算(SC)对于TinyML来说是很有希望的,因为它可以使用简单的逻辑门而不是复杂的二进制添加器和乘数来进行基本的ML操作。然而,由于数据精确度低和计算单位不准确性,SC的任务通常不准确。增加现有工程的位流长度可以缓解精确度问题,但拉长。在这项工作中,我们提出了一个新的SC结构,即以块为基础的Stochasteim 计算(BC)。BSC将投入分成几个块,这样,通过利用高数据的平行性高数据平行性来降低弹性。此外,由于数据精确度任务通常由于数据精确度低精度的计算单位而使得BSLRistryalalalalalalalalalalalalalal-de lade lax lax lax lax lax lax