Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structural connections and ignore global, higher-order information between the sequences. To address these challenges in the sequence classification problems, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN. To capture the complex structural similarity between sequence data, we first create a hypergraph where the sequences are depicted as hyperedges and subsequences extracted from sequences are depicted as nodes. Additionally, we introduce an attention-based Hypergraph Neural Network model that utilizes a two-level attention mechanism. This model generates a sequence representation as a hyperedge while simultaneously learning the crucial subsequences for each sequence. We conduct extensive experiments on four data sets to assess and compare our model with several state-of-the-art methods. Experimental results demonstrate that our proposed Seq-HyGAN model can effectively classify sequence data and significantly outperform the baselines. We also conduct case studies to investigate the contribution of each module in Seq-HyGAN.
翻译:序列分类在不同领域中具有广泛的实际应用,例如在健康中的基因组分类和在商业中的异常检测。然而,序列数据中缺乏明确的特征,这使得机器学习模型变得困难。虽然神经网络(NN)模型通过自动学习特征来解决这个问题,但它们仅限于捕捉相邻的结构连接,并忽略序列之间的全局、高阶信息。为了解决序列分类问题中的这些挑战,我们提出了一种新颖的超图注意力网络模型,即Seq-HyGAN。为了捕捉序列数据之间的复杂结构相似性,我们首先创建一个超图,其中序列被描绘为超边,从序列中提取的子序列被描绘为节点。此外,我们引入了一种基于注意力的超图神经网络模型,利用了两级注意机制,该模型在生成超边的同时学习每个序列的关键子序列,产生序列表示。我们对四个数据集进行了广泛的实验,以评估和比较我们的模型与几种最先进的方法。实验结果表明,我们提出的Seq-HyGAN模型可以有效地分类序列数据,并显著优于基线。我们还进行了案例研究,以调查Seq-HyGAN中每个模块的贡献。