Detecting and extracting textual information from natural scene images needs Scene Text Detection (STD) algorithms. Fully Convolutional Neural Networks (FCNs) are usually utilized as the backbone model to extract features in these instance segmentation based STD algorithms. FCNs naturally come with high computational complexity. Furthermore, to keep up with the growing variety of models, flexible architectures are needed. In order to accelerate various STD algorithms efficiently, a versatility-performance balanced hardware architecture is proposed, together with a simple but efficient way of configuration. This architecture is able to compute different FCN models without hardware redesign. The optimization is focused on hardware with finely designed computing modules, while the versatility of different network reconfigurations is achieved by microcodes instead of a strenuously designed compiler. Multiple parallel techniques at different levels and several complexity-reduction methods are explored to speed up the FCN computation. Results from implementation show that, given the same tasks, the proposed system achieves a better throughput compared with the studied GPU. Particularly, our system reduces the comprehensive Operation Expense (OpEx) at GPU by 46\%, while the power efficiency is enhanced by 32\%. This work has been deployed in commercial applications and provided stable consumer text detection services.
翻译:暂无翻译