Prediction using the ground truth sounds like an oxymoron in machine learning. However, such an unrealistic setting was used in hundreds, if not thousands of papers in the area of finding graph representations. To evaluate the multi-label problem of node classification by using the obtained representations, many works assume in the prediction stage that the number of labels of each test instance is known. In practice such ground truth information is rarely available, but we point out that such an inappropriate setting is now ubiquitous in this research area. We detailedly investigate why the situation occurs. Our analysis indicates that with unrealistic information, the performance is likely over-estimated. To see why suitable predictions were not used, we identify difficulties in applying some multi-label techniques. For the use in future studies, we propose simple and effective settings without using practically unknown information. Finally, we take this chance to conduct a fair and serious comparison of major graph-representation learning methods on multi-label node classification.
翻译:使用地面真理的预测听起来像机器学习中的氧化物。 但是,在寻找图形表示方面,数百份甚至数千份论文使用了这种不切实际的设置。为了利用获得的演示来评估多标签节点分类问题,许多工作在预测阶段假定每个测试实例的标签数量为人所知。实际上,这种地面真相信息很少,但我们指出,在这个研究领域,这种不适当的设置现在无处不在。我们详细调查了这种情况发生的原因。我们的分析表明,如果有不现实的信息,那么业绩可能过高估计。为了了解为什么没有使用适当的预测,我们发现在应用一些多标签技术方面存在困难。关于未来研究的用途,我们建议简单而有效的环境,而不使用实际上未知的信息。最后,我们借此机会对多标签节点分类的主要图表表述学习方法进行公平和认真的比较。