In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency overheads. Moreover, due to the memory wall problem that presents in the conventional architecture, frequent cosine similarity-based searches (CSSs) over the class vectors requires a lot of data movements, limiting the throughput and efficiency of the system. To overcome the aforementioned challenges, this paper introduces COSIME, an general in-memory associative memory (AM) engine based on the ferroelectric FET (FeFET) device for efficient CSS. By leveraging the one-transistor AND gate function of FeFET devices, current-based translinear analog circuit and winner-take-all (WTA) circuitry, COSIME can realize parallel in-memory CSS across all the entries in a memory block, and output the closest word to the input query in cosine similarity metric. Evaluation results at the array level suggest that the proposed COSIME design achieves 333X and 90.5X latency and energy improvements, respectively, and realizes better classification accuracy when compared with an AM design implementing approximated CSS. The proposed in-memory computing fabric is evaluated for an HDC problem, showcasing that COSIME can achieve on average 47.1X and 98.5X speedup and energy efficiency improvements compared with an GPU implementation.
翻译:在一系列机器学习模型中,在经过培训的舱矢量中,搜索一个输入查询,以找到在焦线相似度度度度度度测量中最接近的特性级矢量。然而,在Von-Neumann机器中,在矢量矢量之间进行焦量相似性,这涉及到大量倍增、Euclidean正常化和分化操作,从而产生大量硬件能量和内嵌性间接费用。此外,由于传统结构中的内存墙问题,对级矢量的经常同步相似性搜索(CSS)5 需要大量数据移动,限制系统的吞吐量和效率。为了克服上述挑战,本文介绍了COSIME,这是基于电离电器FET(FeFET)装置的一种一般模拟联合内存(AM)引擎。由于利用FFFET装置的一流压和门功能功能,基于当前线性模拟电路路路和赢家通电路路路,COSI可以在所有条目中实现模CSS中平行的CIS的同步同步同步同步值,因此可以在一个存储区段内实现最接近的计算结果,并且在计算中实现最接近的CIMX的计算。