S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance. Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. However, the study had methodological issues and was undertaken using a small sample size leading to unreliable results. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. Further, the correlation observed between parser performance and graph isomorphism in the wild disappears when controlling for covariants. However, in a controlled experiment, where covariants are kept fixed, we do observe a strong correlation. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.
翻译:(2020年) 测试数据中出现的树木在树体中是形态化的,与培训组的树木相比,结果显示树木在树体中的分数反映了采石器性能的非三角差异。与国家实验室的其他统计分析一样,结果也基于线性回归的评估。然而,研究有方法上的问题,并且使用少量样本进行,结果不可靠。我们提出了一个复制研究,其中我们也按长度将句子放入中,发现在图形的异形性能方面,只有一小部分句子不同。此外,在控制共变物时,观察到的采集器性能和图形的异形性关系在野外消失。然而,在受控的实验中,共变物是固定的,我们确实观察到了强烈的关联性。我们建议,从这种统计分析中得出的结论需要缓和,而受控实验可以用更方便的茶叶因素加以补充。