The process of selecting points for training a machine learning model is often a challenging task. Many times, we will have a lot of data, but for training, we require the labels and labeling is often costly. So we need to select the points for training in an efficient manner so that the model trained on the points selected will be better than the ones trained on any other training set. We propose a novel method to select the nodes in graph datasets using the concept of graph centrality. Two methods are proposed - one using a smart selection strategy, where the model is required to be trained only once and another using active learning method. We have tested this idea on three popular graph datasets - Cora, Citeseer and Pubmed- and the results are found to be encouraging.
翻译:为培训机器学习模型选择点往往是一项艰巨的任务。 许多时候,我们将拥有大量的数据,但为了培训,我们要求标签和标签往往成本很高。 所以我们需要以高效的方式选择培训点,这样,在选定点上培训的模型将比在任何其他培训组上培训的模型要好。 我们提出了一个新颖的方法来选择图形数据集中的节点。 我们提出了两种方法 — — 一种是使用智能选择战略,要求模型只接受一次培训,另一种是使用积极学习方法。 我们已经在三个受欢迎的图表数据集(Cora、Citseer和Pubmed)上测试了这一想法,结果令人鼓舞。