Recent advances in artificial intelligence have prompted the search for enhanced algorithms and hardware to support the deployment of machine learning at the edge. More specifically, in the context of the Internet of Things (IoT), vision chips must be able to fulfill tasks of low to medium complexity, such as feature extraction or region-of-interest (RoI) detection, with a sub-mW power budget imposed by the use of small batteries or energy harvesting. Mixed-signal vision chips relying on in- or near-sensor processing have emerged as an interesting candidate, thanks to their favorable tradeoff between energy efficiency (EE) and computational accuracy compared to digital systems for these specific tasks. In this paper, we introduce a mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS, featuring a unique combination of large 16$\times$16 4b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks. The main contributions are (i) circuits called DS3 units combining delta-reset sampling, image downsampling, and voltage downshifting, and (ii) charge-domain multiply-and-accumulate (MAC) operations based on switched-capacitor amplifiers and charge sharing in the capacitive DAC of the successive-approximation ADCs. MANTIS achieves peak EEs normalized to 1b operations of 4.6 and 84.1 TOPS/W at the accelerator and SoC levels, while computing feature maps with a root mean square error ranging from 3 to 11.3$\%$. It also demonstrates a face RoI detection with a false negative rate of 11.5$\%$, while discarding 81.3$\%$ of image patches and reducing the data transmitted off chip by 13$\times$ compared to the raw image.
翻译:暂无翻译