Strategic decision-making in Pokémon battles presents a unique testbed for evaluating large language models. Pokémon battles demand reasoning about type matchups, statistical trade-offs, and risk assessment, skills that mirror human strategic thinking. This work examines whether Large Language Models (LLMs) can serve as competent battle agents, capable of both making tactically sound decisions and generating novel, balanced game content. We developed a turn-based Pokémon battle system where LLMs select moves based on battle state rather than pre-programmed logic. The framework captures essential Pokémon mechanics: type effectiveness multipliers, stat-based damage calculations, and multi-Pokémon team management. Through systematic evaluation across multiple model architectures we measured win rates, decision latency, type-alignment accuracy, and token efficiency. These results suggest LLMs can function as dynamic game opponents without domain-specific training, offering a practical alternative to reinforcement learning for turn-based strategic games. The dual capability of tactical reasoning and content creation, positions LLMs as both players and designers, with implications for procedural generation and adaptive difficulty systems in interactive entertainment.
翻译:宝可梦对战中的策略决策为评估大型语言模型提供了独特的测试平台。宝可梦对战需要推理属性相克关系、统计权衡与风险评估,这些技能反映了人类的策略性思维。本研究探讨大型语言模型能否作为合格的对战智能体,既能做出战术合理的决策,又能生成新颖且平衡的游戏内容。我们开发了一个回合制宝可梦对战系统,其中LLM根据对战状态而非预设逻辑选择招式。该框架捕捉了宝可梦的核心机制:属性相克倍数、基于能力值的伤害计算以及多宝可梦队伍管理。通过对多种模型架构的系统性评估,我们测量了胜率、决策延迟时间、属性匹配准确率和令牌效率。结果表明,LLM无需领域特定训练即可作为动态游戏对手,为回合制策略游戏提供了强化学习的实用替代方案。战术推理与内容生成的双重能力使LLM兼具玩家与设计者角色,这对交互娱乐中的程序化生成与自适应难度系统具有重要启示。