Microscopic characterizations, such as Scanning Electron Microscopy (SEM), are widely used in scientific research for visualizing and analyzing microstructures. Determining the scale bars is an important first step of accurate SEM analysis; however, currently, it mainly relies on manual operations, which is both time-consuming and prone to errors. To address this issue, we propose a multi-modal and automated scale bar detection and extraction framework that provides concurrent object detection, text detection and text recognition with a Large Language Model (LLM) agent. The proposed framework operates in four phases; i) Automatic Dataset Generation (Auto-DG) model to synthesize a diverse dataset of SEM images ensuring robust training and high generalizability of the model, ii) scale bar object detection, iii) information extraction using a hybrid Optical Character Recognition (OCR) system with DenseNet and Convolutional Recurrent Neural Network (CRNN) based algorithms, iv) an LLM agent to analyze and verify accuracy of the results. The proposed model demonstrates a strong performance in object detection and accurate localization with a precision of 100%, recall of 95.8%, and a mean Average Precision (mAP) of 99.2% at IoU=0.5 and 69.1% at IoU=0.5:0.95. The hybrid OCR system achieved 89% precision, 65% recall, and a 75% F1 score on the Auto-DG dataset, significantly outperforming several mainstream standalone engines, highlighting its reliability for scientific image analysis. The LLM is introduced as a reasoning engine as well as an intelligent assistant that suggests follow-up steps and verifies the results. This automated method powered by an LLM agent significantly enhances the efficiency and accuracy of scale bar detection and extraction in SEM images, providing a valuable tool for microscopic analysis and advancing the field of scientific imaging.
翻译:扫描电子显微镜(SEM)等显微表征技术广泛应用于科学研究中的微结构可视化与分析。确定标尺是进行准确SEM分析的重要初始步骤;然而,目前该步骤主要依赖人工操作,既耗时又易出错。为解决此问题,我们提出了一种多模态自动标尺检测与提取框架,该框架通过大语言模型(LLM)智能体,同步实现目标检测、文本检测与文本识别。所提框架包含四个阶段:i) 自动数据集生成(Auto-DG)模型,用于合成多样化的SEM图像数据集,确保模型的鲁棒训练与高泛化能力;ii) 标尺目标检测;iii) 使用基于DenseNet和卷积循环神经网络(CRNN)算法的混合光学字符识别(OCR)系统进行信息提取;iv) 利用LLM智能体分析与验证结果的准确性。所提模型在目标检测与精确定位方面表现出色,在IoU=0.5时取得了100%的精确率、95.8%的召回率以及99.2%的平均精度均值(mAP),在IoU=0.5:0.95时mAP为69.1%。混合OCR系统在Auto-DG数据集上实现了89%的精确率、65%的召回率以及75%的F1分数,显著优于多个主流独立引擎,凸显了其在科学图像分析中的可靠性。LLM被引入作为推理引擎及智能助手,可建议后续步骤并验证结果。这种由LLM智能体驱动的自动化方法,显著提升了SEM图像中标尺检测与提取的效率和准确性,为显微分析提供了有价值的工具,并推动了科学成像领域的发展。