We describe the development of a real-time smartphone app that allows the user to digitize paper receipts in a novel way by "waving" their phone over the receipts and letting the app automatically detect and rectify the receipts for subsequent text recognition. We show that traditional computer vision algorithms for edge and corner detection do not robustly detect the non-linear and discontinuous edges and corners of a typical paper receipt in real-world settings. This is particularly the case when the colors of the receipt and background are similar, or where other interfering rectangular objects are present. Inaccurate detection of a receipt's corner positions then results in distorted images when using an affine projective transformation to rectify the perspective. We propose an innovative solution to receipt corner detection by treating each of the four corners as a unique "object", and training a Single Shot Detection MobileNet object detection model. We use a small amount of real data and a large amount of automatically generated synthetic data that is designed to be similar to real-world imaging scenarios. We show that our proposed method robustly detects the four corners of a receipt, giving a receipt detection accuracy of 85.3% on real-world data, compared to only 36.9% with a traditional edge detection-based approach. Our method works even when the color of the receipt is virtually indistinguishable from the background. Moreover, our method is trained to detect only the corners of the central target receipt and implicitly learns to ignore other receipts, and other rectangular objects. Including synthetic data allows us to train an even better model. These factors are a major advantage over traditional edge detection-based approaches, allowing us to deliver a much better experience to the user.
翻译:我们描述一个实时智能手机应用程序的开发,使用户能够以新颖的方式将纸质收据数字化,在收据上“删除”他们的手机,让应用程序自动检测和纠正收据,以便随后的文本识别。我们显示,用于边缘和角探测的传统计算机视觉算法没有有力地检测现实世界环境中典型纸质收据的非线性和不连续的边缘和角。当接收和背景的颜色相似,或者存在其他干扰性矩形物体时,尤其如此。不准确地检测收据的角落位置,然后导致图像被扭曲,而当使用亲近的投影变形转换以纠正观点时。我们提出一种创新的办法来接收角检测,将四个角落中的每个角落作为独特的“目标”,并培训一个单一的Shoot Search MobNet物体检测模型。我们使用少量真实数据和大量自动生成的合成数据,这些数据的设计与现实世界成像的情景类似,或者存在其他干扰性矩形物体。我们提出的方法能够强有力地探测到收据的四角,在使用近乎85.3%的检测结果之前,甚至对用户进行接收背景的准确度检测,我们从现实世界中心的数据进行精确的接收,而只是从真实的检测到另一个方法。</s>