Despite its pivotal role in research experiments, code correctness is often presumed only on the basis of the perceived quality of the results. This comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on result reproducibility should go hand in hand with the emphasis on coding best practices. We bolster our call to the NLP community by presenting a case study, in which we identify (and correct) three bugs in widely used open-source implementations of the state-of-the-art Conformer architecture. Through comparative experiments on automatic speech recognition and translation in various language settings, we demonstrate that the existence of bugs does not prevent the achievement of good and reproducible results and can lead to incorrect conclusions that potentially misguide future research. In response to this, this study is a call to action toward the adoption of coding best practices aimed at fostering correctness and improving the quality of the developed software.
翻译:----
可复现性没有正确性:在自然语言处理中测试代码的重要性
翻译后的摘要:
尽管代码正确性在研究实验中起着关键作用,但通常仅基于结果质量假定代码的正确性。这带来了错误结果和潜在误导性发现的风险。为了解决这个问题,我们认为目前的重点是结果的可复现性,这应该与对编码最佳实践的强调相辅相成。我们通过一个案例研究来支持我们在自然语言处理领域的呼吁。在这个案例研究中,我们确定了广泛使用的最先进的Conformer架构的开源实现中的三个错误。通过在各种语言设置中进行自动语音识别和翻译的比较实验,我们证明了错误的存在并不会妨碍实现良好且可重复的结果,并可能导致不正确的结论,从而潜在误导未来的研究。作为回应,本研究呼吁采用编码最佳实践以促进正确性并提高开发软件的质量。