In this paper, we consider a different data format for images: vector graphics. In contrast to raster graphics which are widely used in image recognition, vector graphics can be scaled up or down into any resolution without aliasing or information loss, due to the analytic representation of the primitives in the document. Furthermore, vector graphics are able to give extra structural information on how low-level elements group together to form high level shapes or structures. These merits of graphic vectors have not been fully leveraged in existing methods. To explore this data format, we target on the fundamental recognition tasks: object localization and classification. We propose an efficient CNN-free pipeline that does not render the graphic into pixels (i.e. rasterization), and takes textual document of the vector graphics as input, called YOLaT (You Only Look at Text). YOLaT builds multi-graphs to model the structural and spatial information in vector graphics, and a dual-stream graph neural network is proposed to detect objects from the graph. Our experiments show that by directly operating on vector graphics, YOLaT out-performs raster-graphic based object detection baselines in terms of both average precision and efficiency.
翻译:在本文中, 我们考虑一种不同的图像数据格式: 矢量图形。 与在图像识别中广泛使用的光栅图形相比, 矢量图形可以不因原始的分析和表达方式而缩放或降为任何分辨率, 而不作别名或信息丢失。 此外, 矢量图形能够提供额外的结构信息, 说明低层元素组如何形成高层次的形状或结构。 图形矢量的这些优点在现有的方法中没有得到充分利用。 为了探索这一数据格式, 我们以基本识别任务为目标: 对象本地化和分类。 我们建议一个高效的无线网络, 不将图形化为像素( 光化), 并且将矢量图形的文本文档作为输入, 叫做 YOLaT ( 您只看文本 ) 。 YOLaT 建立多面图, 以模拟矢量图形中的结构和空间信息, 并提议一个双流图神经神经网络来检测图形中的天体。 我们的实验显示, 通过直接操作矢量矢量图形, YOLAT 和平均测距仪基基准值测试参数, 和平均测距镜表 。