Different types of spectroscopies, such as X-ray absorption near edge structure (XANES) and Raman spectroscopy, play a very important role in analyzing the characteristics of different materials. In scientific literature, XANES/Raman data are usually plotted in line graphs which is a visually appropriate way to represent the information when the end-user is a human reader. However, such graphs are not conducive to direct programmatic analysis due to the lack of automatic tools. In this paper, we develop a plot digitizer, named Plot2Spectra, to extract data points from spectroscopy graph images in an automatic fashion, which makes it possible for large scale data acquisition and analysis. Specifically, the plot digitizer is a two-stage framework. In the first axis alignment stage, we adopt an anchor-free detector to detect the plot region and then refine the detected bounding boxes with an edge-based constraint to locate the position of two axes. We also apply scene text detector to extract and interpret all tick information below the x-axis. In the second plot data extraction stage, we first employ semantic segmentation to separate pixels belonging to plot lines from the background, and from there, incorporate optical flow constraints to the plot line pixels to assign them to the appropriate line (data instance) they encode. Extensive experiments are conducted to validate the effectiveness of the proposed plot digitizer, which shows that such a tool could help accelerate the discovery and machine learning of materials properties.
翻译:在科学文献中,XANES/Raman数据通常是用直线图绘制的,在最终用户为人类读者时,这种直线图是一种视觉上适当的方法,可以代表信息。然而,由于缺乏自动工具,这些图表不利于直接的方案分析。在本文中,我们开发了一个绘图数字仪,名为Plot2Spectra,以自动方式从光谱图图像中提取数据点,从而有可能进行大规模数据收集和分析。在科学文献中,XANES/Raman数据通常是用直线图形绘制的。在第一个轴对齐阶段,我们采用无锚探测器,在终端用户为人类读者时可以代表信息。然而,由于缺少自动工具,这些图表不利于直接进行方案分析。在本文中,我们还开发了一个绘图仪,可以提取和解释X轴以下的所有滴信息。在第二个绘图数据提取阶段,我们首先使用了精谱化数字仪,从而可以进行大规模的数据收集和分析。具体数据采集器是一个两阶段框架。具体地,在第一个轴上,绘图数字图数字化数据分析器是一个不固定的图解路段,其图形图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图解图。