Innovation in artificial intelligence (AI) has always been dependent on technological infrastructures, from code repositories to computing hardware. Yet industry -- rather than universities -- has become increasingly influential in shaping AI innovation. As generative forms of AI powered by large language models (LLMs) have driven the breakout of AI into the wider world, the AI community has sought to develop new methods for independently evaluating the performance of AI models. How best, in other words, to compare the performance of AI models against other AI models -- and how best to account for new models launched on nearly a daily basis? Building on recent work in media studies, STS, and computer science on benchmarking and the practices of AI evaluation, I examine the rise of so-called 'arenas' in which AI models are evaluated with reference to gladiatorial-style 'battles'. Through a technography of a leading user-driven AI model evaluation platform, LMArena, I consider five themes central to the emerging 'arena-ization' of AI innovation. Accordingly, I argue that the arena-ization is being powered by a 'viral' desire to capture attention both in, and outside of, the AI community, critical to the scaling and commercialization of AI products. In the discussion, I reflect on the implications of 'arena gaming', a phenomenon through which model developers hope to capture attention.
翻译:人工智能(AI)的创新始终依赖于技术基础设施,从代码仓库到计算硬件。然而,产业界——而非大学——在塑造AI创新方面的影响力日益增强。随着基于大语言模型(LLMs)的生成式AI推动AI技术向更广泛领域突破,AI社区致力于开发独立评估AI模型性能的新方法。换言之,如何最佳地比较AI模型与其他模型的性能,并如何有效应对几乎每日发布的新模型?基于近期媒体研究、科学技术研究(STS)和计算机科学领域关于基准测试与AI评估实践的研究,本文考察了所谓“竞技场”的兴起,其中AI模型通过角斗士风格的“对战”进行评估。通过对领先的用户驱动型AI模型评估平台LMArena的技术志分析,我探讨了AI创新“竞技场化”趋势中的五个核心主题。据此,我认为竞技场化正由一种“病毒式”的注意力捕获欲望驱动,这种欲望在AI社区内外均至关重要,对AI产品的规模化与商业化具有关键影响。在讨论中,我反思了“竞技场博弈”现象的影响——模型开发者试图通过此现象捕获注意力。