Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for weighting schemes, where existing schemes are extremes, and two new intermediate schemes. We show that reporting results of different weighting schemes better highlights strengths and weaknesses of a model.
翻译:在这项工作中,我们分析了不平衡数据集的加权办法,如微观和宏观的加权办法。我们为加权办法引入了一个框架,其中现有的办法是极端的,还有两个新的中间办法。我们表明,不同加权办法的报告结果更能突出模型的长处和短处。