We propose a novel end-to-end solution that performs a Hierarchical Layout Analysis of screenshots and document images on resource constrained devices like mobilephones. Our approach segments entities like Grid, Image, Text and Icon blocks occurring in a screenshot. We provide an option for smart editing by auto highlighting these entities for saving or sharing. Further this multi-level layout analysis of screenshots has many use cases including content extraction, keyword-based image search, style transfer, etc. We have addressed the limitations of known baseline approaches, supported a wide variety of semantically complex screenshots, and developed an approach which is highly optimized for on-device deployment. In addition, we present a novel weighted NMS technique for filtering object proposals. We achieve an average precision of about 0.95 with a latency of around 200ms on Samsung Galaxy S10 Device for a screenshot of 1080p resolution. The solution pipeline is already commercialized in Samsung Device applications i.e. Samsung Capture, Smart Crop, My Filter in Camera Application, Bixby Touch.
翻译:我们提出一个新的端到端解决方案,对移动电话等资源受限装置的截图和文件图像进行分层分析。 我们的方法区块实体,如Grid、图像、文本和在截图中出现的图标块,我们提供一种选择,通过自动突出这些实体来进行智能编辑,以便保存或共享。此外,对截图的多层次布局分析有许多使用案例,包括内容提取、关键字图像搜索、风格传输等。我们已经解决了已知基线方法的局限性,支持了多种精密复杂的截图,并开发了一种高度优化用于在构件上部署的方法。此外,我们提出了用于筛选对象提案的新型加权NMS技术。我们实现了平均精确度约为0.95,在三星银河S10设备上,悬浮度约为200米,用于1080p分辨率的截图。解决方案管道已经在三星设备应用程序中商业化,即三星捕捉、智能作物、相机应用程序中的我的过滤器、比克斯比触中。