Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.
翻译:从Linac Coherent光源(LCLS-II)和高级光源升级(APS-U)等仪器产生的数据中迅速提取可采取行动的信息,由于数据率高(最高为TB/s),因此越来越难以迅速地利用基于常规物理的信息检索方法来迅速探测有趣的事件,以便及时关注稀有事件或纠正错误。学习廉价代用分类器的机器学习~(ML)方法是一种有希望的替代方法,但如果仪器或样本的变化导致ML性能退化,则可能灾难性地失败。为了克服这些困难,我们提出了一个新的数据储存和ML模型培训结构,目的是组织大量数据和模型,以便在发现模型退化时,可以快速查询以前的模型和/或数据,并有一个更合适的模型检索和微调,以适应新的条件。我们表明,我们的方法可以达到100x的数据标签速度,与目前最先进的标准相比,培训速度提高200x,培训速度加快92x时间到终端更新模型。