Graph serves as a powerful tool for modeling data that has an underlying structure in non-Euclidean space, by encoding relations as edges and entities as nodes. Despite developments in learning from graph-structured data over the years, one obstacle persists: graph imbalance. Although several attempts have been made to target this problem, they are limited to considering only class-level imbalance. In this work, we argue that for graphs, the imbalance is likely to exist at the sub-class topology group level. Due to the flexibility of topology structures, graphs could be highly diverse, and learning a generalizable classification boundary would be difficult. Therefore, several majority topology groups may dominate the learning process, rendering others under-represented. To address this problem, we propose a new framework {\method} and design (1 a topology extractor, which automatically identifies the topology group for each instance with explicit memory cells, (2 a training modulator, which modulates the learning process of the target GNN model to prevent the case of topology-group-wise under-representation. {\method} can be used as a key component in GNN models to improve their performances under the data imbalance setting. Analyses on both topology-level imbalance and the proposed {\method} are provided theoretically, and we empirically verify its effectiveness with both node-level and graph-level classification as the target tasks.
翻译:图表是一个强大的模型工具,用于建模数据,这种数据在非欧化空间中具有基本结构,将关系作为边缘和实体作为节点进行编码。尽管多年来在从图表结构数据中学习方面有所发展,但有一个障碍依然存在:图表不平衡。虽然曾几次试图解决这一问题,但只限于考虑等级不平衡。在这项工作中,我们认为,对于图表而言,不平衡可能存在于亚类表层组一级。由于表层结构的灵活性,图表可能非常多样化,并难以学习通用的分类界限。因此,一些多数的表层组可能主导学习过程,使其他人代表不足。为解决这一问题,我们提出了一个新的框架 ~method} 和设计(1个表层提取器,它自动确定每个案例的表层组具有明确的记忆细胞,2个培训模块,它调控调GNN模型的学习过程,以防止表层组代表性不足的情况发生。 rmethod}一些多数的表层组组组组群可能主导着学习过程,使其他人代表人数不足。为了解决这个问题,我们建议一个新的框架和设计一个表层平衡,在GNNN的模型中不能使用。