X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiment's operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends.
翻译:使用自由电子激光器(XFELs)进行X射线分散实验是确定未知样品(如COVID-19病毒蛋白质)分子结构和功能的有力工具。 XFEL实验是计算的一个挑战,有两个方面:(一) 由于运行XFELs的成本高昂,从数据获取到数据分析的快速周转时间对于就实验协议作出知情决定至关重要;(二) 数据收集速度呈指数增长趋势,需要新的可缩放算法。这里我们报告了我们分析2020年9月在Linac Coherent光源(LLLLLS)进行的两个实验数据的经验。在NERSC的Cori XC40系统上分析了原始数据:我们的工作流程自动移动到LCLSS和NERSC之间的原始数据原始数据,在使用CCTBX软件包进行分析。我们实现了实时数据分析,从数据获取到完全分子重建的周期只有10分钟 -- -- 实验者有充足时间做出知情决定。通过主办关于Cori的数据分析,在NERS的Cori XC-CS-CRELS数据采集率上实现了重要的数据采集率分析。