Bipartite graphs are powerful data structures to model interactions between two types of nodes, which have been used in a variety of applications, such as recommender systems, information retrieval, and drug discovery. A fundamental challenge for bipartite graphs is how to learn informative node embeddings. Despite the success of recent self-supervised learning methods on bipartite graphs, their objectives are discriminating instance-wise positive and negative node pairs, which could contain cluster-level errors. In this paper, we introduce a novel co-cluster infomax (COIN) framework, which captures the cluster-level information by maximizing the mutual information of co-clusters. Different from previous infomax methods which estimate mutual information by neural networks, COIN could easily calculate mutual information. Besides, COIN is an end-to-end coclustering method which can be trained jointly with other objective functions and optimized via back-propagation. Furthermore, we also provide theoretical analysis for COIN. We theoretically prove that COIN is able to effectively increase the mutual information of node embeddings and COIN is upper-bounded by the prior distributions of nodes. We extensively evaluate the proposed COIN framework on various benchmark datasets and tasks to demonstrate the effectiveness of COIN.
翻译:双部分图是用来模拟两类节点之间互动的强有力的数据结构,这两类节点被用于各种应用,例如建议系统、信息检索和药物发现。双部分图的一个基本挑战是如何学习信息节点嵌入。尽管最近双部分图上自监督的学习方法取得成功,但是它们的目标是区分实例-从正对和负节点对配,这可能包含集群级错误。在本文件中,我们引入了一个新的联合组合信息(COIN)框架,通过最大限度地利用共同组群的相互信息来捕捉集群一级信息。与以前用来估计神经网络相互信息的信息的Infomex方法不同,COIN可以很容易地计算相互信息。此外,COIN是一种端对端的组合方法,可以通过其他客观功能进行联合培训,并通过反向调整优化。此外,我们还为COIN提供理论分析。我们理论上证明,COIN能够有效地增加节点嵌的相互信息,而COIN设置的COIN设置的COIN框架在先前的分布基准中,我们没有对各种COIN基准框架进行广泛的评估。