We present a novel architecture for safely integrating Large Language Models (LLMs) into interactive game engines, allowing players to "program" new behaviors using natural language. Our framework mitigates risks by using an LLM to translate commands into a constrained Domain-Specific Language (DSL), which configures a custom Entity-Component-System (ECS) at runtime. We evaluated this system in a 2D spell-crafting game prototype by experimentally assessing models from the Gemini, GPT, and Claude families with various prompting strategies. A validated LLM judge qualitatively rated the outputs, showing that while larger models better captured creative intent, the optimal prompting strategy is task-dependent: Chain-of-Thought improved creative alignment, while few-shot examples were necessary to generate more complex DSL scripts. This work offers a validated LLM-ECS pattern for emergent gameplay and a quantitative performance comparison for developers.
翻译:我们提出了一种将大型语言模型安全集成到交互式游戏引擎中的新型架构,使玩家能够通过自然语言“编程”新行为。该框架通过使用LLM将指令翻译为受限领域特定语言来降低风险,该DSL在运行时配置自定义的实体-组件系统。我们在一个2D法术构建游戏原型中评估了该系统,通过实验测试了Gemini、GPT和Claude系列模型在不同提示策略下的表现。经验证的LLM评判器对输出结果进行了定性评估,结果表明:虽然更大规模的模型能更好地捕捉创作意图,但最优提示策略具有任务依赖性——思维链提示能提升创意对齐度,而少样本示例对于生成更复杂的DSL脚本是必需的。这项工作为涌现式游戏玩法提供了经过验证的LLM-ECS范式,并为开发者提供了定量性能比较基准。