There are many challenges associated with analysing gas chromatography - mass spectrometry (GC-MS) data. Many of these challenges stem from the fact that electron ionisation can make it difficult to recover molecular information due to the high degree of fragmentation with concomitant loss of molecular ion signal. With GC-MS data there are often many common fragment ions shared among closely-eluting peaks, necessitating sophisticated methods for analysis. Some of these methods are fully automated, but make some assumptions about the data which can introduce artifacts during the analysis. Chemometric methods such as Multivariate Curve Resolution, or Parallel Factor Analysis are particularly attractive, since they are flexible and make relatively few assumptions about the data - ideally resulting in fewer artifacts. These methods do require expert user intervention to determine the most relevant regions of interest and an appropriate number of components, $k$, for each region. Automated region of interest selection is needed to permit automated batch processing of chromatographic data with advanced signal deconvolution. Here, we propose a new method for automated, untargeted region of interest selection that accounts for the multivariate information present in GC-MS data to select regions of interest based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram. Assuming that the first singular value accounts largely for signal, and that the second singular value accounts largely for noise, it is possible to interpret the relationship between these two values as a probabilistic distribution of Fisher Ratios. The sensitivity of the algorithm was tested by investigating the concentration at which the algorithm can no longer pick out chromatographic regions known to contain signal.
翻译:分析气相色谱-质量光谱学(GC-MS)数据有许多挑战,其中许多挑战源于分析气体相色谱学(GC-MS)数据。分析气相色谱学-质量光谱学(GC-MS)数据涉及许多挑战。分析电子电离化可能难以恢复分子信息,因为随着分子离子信号的丢失,电离电离电离电离电离电离电离电离高度分散。由于GC-MS数据往往有许多共同的碎片离子,需要复杂的分析方法。其中一些方法是完全自动化的,但对在分析过程中可以引入工艺品的数据作出一些假设。 化相色谱学方法特别具有吸引力,因为多变曲线分辨率分辨率解法或平行要素分析等测度方法在很大程度上具有灵活性,对数据进行相对较少的假设。这些方法需要专家用户干预,以确定最相关的相关区域以及每个区域的适当组成部分($k$),需要自动选择的利息区域,以便用先进的信号变相色谱数据进行自动分批处理。在这里,我们提出一种新的方法用于自动、非目标化区域选择第一个变量的利息区域,因为其第一个正等值的数值比比值数据在GC-MS的比值中,这些直线值的数值的数值的数值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比值的比。