Recent research has demonstrated that large language models (LLMs) can support experts across various domains, including game design. In this study, we examine the utility of medium-sized LLMs, models that operate on consumer-grade hardware typically available in small studios or home environments. We began by identifying ten key aspects that contribute to a strong game concept and used ChatGPT to generate thirty sample game ideas. Three medium-sized LLMs, LLaMA 3.1, Qwen 2.5, and DeepSeek-R1, were then prompted to evaluate these ideas according to the previously identified aspects. A qualitative assessment by two researchers compared the models' outputs, revealing that DeepSeek-R1 produced the most consistently useful feedback, despite some variability in quality. To explore real-world applicability, we ran a pilot study with ten students enrolled in a storytelling course for game development. At the early stages of their own projects, students used our prompt and DeepSeek-R1 to refine their game concepts. The results indicate a positive reception: most participants rated the output as high quality and expressed interest in using such tools in their workflows. These findings suggest that current medium-sized LLMs can provide valuable feedback in early game design, though further refinement of prompting methods could improve consistency and overall effectiveness.
翻译:近期研究表明,大型语言模型(LLM)能够为包括游戏设计在内的多个领域专家提供支持。本研究聚焦于中型LLM的实用性——这类模型可在小型工作室或家庭环境中常见的消费级硬件上运行。我们首先确定了构成优秀游戏概念的十个关键维度,并利用ChatGPT生成了三十个示例游戏创意。随后,我们引导三个中型LLM(LLaMA 3.1、Qwen 2.5和DeepSeek-R1)依据既定维度对这些创意进行评估。通过两名研究人员的定性分析对比模型输出,发现DeepSeek-R1能提供最具一致性的有效反馈,尽管其质量存在一定波动。为探究实际应用价值,我们在游戏开发叙事课程的十名学生中开展了试点研究:学生在项目初期使用我们设计的提示词和DeepSeek-R1优化其游戏概念。结果显示积极反响:多数参与者认为输出质量较高,并表示有意在后续工作流程中使用此类工具。这些发现表明,当前中型LLM能够在早期游戏设计阶段提供有价值的反馈,但提示方法的进一步优化有望提升其一致性与整体效能。