While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many real-world applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general approach for handling missing features in graph machine learning applications that is based on minimization of the Dirichlet energy and leads to a diffusion-type differential equation on the graph. The discretization of this equation produces a simple, fast and scalable algorithm which we call Feature Propagation. We experimentally show that the proposed approach outperforms previous methods on seven common node-classification benchmarks and can withstand surprisingly high rates of missing features: on average we observe only around 4% relative accuracy drop when 99% of the features are missing. Moreover, it takes only 10 seconds to run on a graph with $\sim$2.5M nodes and $\sim$123M edges on a single GPU.
翻译:虽然图形神经网络(GNNs)最近已成为模拟关系数据的实际标准,但它们对图形节点或边缘特征的可用性提出了强烈的假设,但在许多现实世界应用程序中,只有部分特征可用;例如,在社交网络中,只有一小部分用户可以使用年龄和性别。我们提出了一个处理图形机器学习应用程序中缺失特征的一般方法,该方法的基础是最大限度地减少dirichlet能量,导致在图形上形成一个扩散型差异方程式。这个方程式的离散产生了一种简单、快速和可缩放的算法,我们称之为“特性推进”。我们实验性地显示,拟议方法在七个通用节点分类基准上超过了先前的方法,并且能够承受出乎意料地高的缺失特征率:平均而言,当99%的特征缺失时,我们只观察到大约4%的相对精度下降。此外,在单一的GPUP上以$\sim$2.5M节点和$\sim123M边缘值运行的图表只需要10秒钟。