Evaluation is the central means for assessing, understanding, and communicating about NLP models. In this position paper, we argue evaluation should be more than that: it is a force for driving change, carrying a sociological and political character beyond its technical dimensions. As a force, evaluation's power arises from its adoption: under our view, evaluation succeeds when it achieves the desired change in the field. Further, by framing evaluation as a force, we consider how it competes with other forces. Under our analysis, we conjecture that the current trajectory of NLP suggests evaluation's power is waning, in spite of its potential for realizing more pluralistic ambitions in the field. We conclude by discussing the legitimacy of this power, who acquires this power and how it distributes. Ultimately, we hope the research community will more aggressively harness evaluation for change.
翻译:评估是评估、理解和交流国家劳工政策模式的核心手段。在本立场文件中,我们主张评估应该不止于此:它是推动变革的力量,具有超越技术层面的社会和政治特性。评估的力量来自其通过:我们认为,当评价在实地实现预期变化时,评价就能取得成功。此外,通过将评价设计成一种力量,我们考虑它如何与其他力量竞争。根据我们的分析,我们推测国家劳工政策目前的轨迹表明,评价的力量正在减弱,尽管它有可能在实地实现更加多元化的雄心。我们最后通过讨论这一力量的合法性,谁获得这种权力,它是如何分配的。我们希望研究界最终能更积极地利用评价来进行变革。