We propose a new algebraic topological framework, which obtains intrinsic information from the MALDI data and transforms it to reflect topological persistence in the data. Our framework has two main advantages. First, the topological persistence helps us to distinguish the signal from noise. Second, it compresses the MALDI data, which results in saving storage space, and also optimizes the computational time for further classification tasks. We introduce an algorithm that performs our topological framework and depends on a single tuning parameter. Furthermore, we show that it is computationally efficient. Following the persistence extraction, logistic regression and random forest classifiers are executed based on the resulting persistence transformation diagrams to classify the observational units into binary class labels, describing the lung cancer subtypes. Further, we utilized the proposed framework in a real-world MALDI data set, and the competitiveness of the methods is illustrated via cross-validation.
翻译:我们提出一个新的代数表表层框架,从MALDI数据中获取内在信息,并将其转换为反映数据中的表层持久性。我们的框架有两个主要优点。首先,表层持久性有助于我们将信号与噪音区分开来。第二,它压缩了MALDI数据,从而节省储存空间,并优化了进一步分类任务的计算时间。我们引入了一种算法,这种算法可以执行我们的表层框架,并依赖于单一调控参数。此外,我们表明它具有计算效率。在持续性提取、物流回归和随机森林分类器之后,根据由此产生的持久性变异图将观测单位分类为二元级标签,描述肺癌子类型。此外,我们利用了一个真实世界的MALDI数据集中的拟议框架,并通过交叉校验来说明这些方法的竞争力。</s>