Following an extensive simulation study comparing the operating characteristics of three different procedures used for establishing equivalence (the frequentist `TOST", the Bayesian "HDI-ROPE", and the Bayes factor interval null procedure), Linde et al. (2021) conclude with the recommendation that "researchers rely more on the Bayes factor interval null approach for quantifying evidence for equivalence." We redo the simulation study of Linde et al. (2021) in its entirety but with the different procedures calibrated to have the same predetermined maximum type 1 error rate. Our results suggest that the Bayes Factor, HDI-ROPE, and frequentist equivalence testing are all essentially equivalent when it comes to predicting equivalence. In general any advocating for frequentist testing as better or worse than Bayesian testing in terms of empirical findings seems dubious at best. If one decides on which underlying principle to subscribe to in tackling a given problem, then the method follows naturally. Bearing in mind that each procedure can be reverse-engineered from the others (at least approximately), trying to use empirical performance to argue for one approach over another seems like tilting at windmills.
翻译:Linde等人(2021年)在进行了广泛的模拟研究,比较了用于确定等同的三种不同程序(常客`TOST'、Bayesian“HDI-ROPE”和Bayes系数间隔无效程序)的操作特点之后,Linde等人(2021年)认为,“研究人员更多地依赖Bayes系数间隔无效方法来量化等同的证据。” 我们完全重新进行了Linde等人(2021年)的模拟研究,但调整了不同的程序,以得出相同的预先确定最大第1型误差率。 我们的结果表明,在预测等同时,Bayes系数、人类发展指数-ROPE和常客等同测试基本上都是等同的。 一般来说,任何主张常客测试比Bayesian测试好或差的实证结果都好的建议,似乎充其量是可疑的。如果一个人决定在解决某一特定问题时同意哪个基本原则,那么该方法就自然了。记住,每一种程序都可以从其他人那里(至少大约)反工程,试图用经验性表现来论证一种方法取代另一种方法,就像在风车倾斜。