We present a scalable in-pixel processing architecture that can reduce the data throughput by 10X and consume less than 30 mW per megapixel at the imager frontend. Unlike the state-of-the-art (SOA) analog process-in-pixel (PIP) that modulates the exposure time of photosensors when performing matrix-vector multiplications, we use switched capacitors and pulse width modulation (PWM). This non-destructive approach decouples the sensor exposure and computing, providing processing parallelism and high data fidelity. Our design minimizes the computational complexity and chip density by leveraging the patch-based feature extraction that can perform as well as the CNN. We further reduce data using partial observation of the attended objects, which performs closely to the full frame observations. We have been studying the reduction of output features as a function of accuracy, chip density and power consumption from a transformer-based backend model for object classification and detection.
翻译:我们展示了一个可缩放的像素处理结构,它能将数据通过量减少10X,并在图像前端每兆像素消耗不到30毫瓦。与最先进的(SOA)模拟像素进程(PIP)不同的是,在进行矩阵-矢量倍增效应时,我们调制光子的接触时间,我们使用交换式电容器和脉搏宽调制(PWM),这种非破坏性方法可以分离传感器接触和计算,提供平行处理和高数据忠诚性。我们的设计通过利用能够运行的基于补丁特征的提取以及CNN来最大限度地减少计算复杂性和芯片密度。我们进一步减少数据,使用部分观测可接近全框架观测的被观察对象。我们一直在研究从基于变压器的后端模型中减少输出特性,以其作为目标分类和检测的精确性、芯密度和能量消耗的函数。