We introduce the Scanning Single Shot Detector (ScanSSD) for locating math formulas offset from text and embedded in textlines. ScanSSD uses only visual features for detection: no formatting or typesetting information such as layout, font, or character labels are employed. Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results. For our experiments we use the TFD-ICDAR2019v2 dataset, a modification of the GTDB scanned math article collection. ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall. Detection errors are largely minor, such as splitting formulas at large whitespace gaps (e.g., for variable constraints) and merging formulas on adjacent textlines. Formula detection f-scores of 0.796 (IOU $\geq0.5$) and 0.733 (IOU $\ge 0.75$) are obtained. Our data, evaluation tools, and code are publicly available.
翻译:我们引入了“扫描单一射击探测器”(ScansSD),用于查找从文本中抵消并嵌入文字线的数学公式。ScansSD仅使用视觉特征进行检测:没有格式化或排字设置信息,例如布局、字体或字符标签。如果使用600 dpi 文档页面图像,则使用一个单一射击探测器(SSSD),使用滑动窗口在多个尺度上定位公式,然后将候选人探测集合起来,以获得页面级结果。对于我们的实验,我们使用“TFD-ICDAR2019v2” 数据集,修改“GTDB”扫描数学物品收藏。SSD探测高精度公式中的字符,获得0.926 f-score,并探测高提醒值的公式。检测错误基本上很小,例如大白空空空白空白上的分割公式(如变量限制)和相邻文本线上的合并公式。指定0.796(IOUE$\ge0.5美元)和0.733(IOU $\ 0.75美元),我们的数据、评估工具和代码是公开的。