Helios：面向智能能源知识推理与应用的基础语言模型 (Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and Application)

In the global drive toward carbon neutrality, deeply coordinated smart energy systems underpin industrial transformation. However, the interdisciplinary, fragmented, and fast-evolving expertise in this domain prevents general-purpose LLMs, which lack domain knowledge and physical-constraint awareness, from delivering precise engineering-aligned inference and generation. To address these challenges, we introduce Helios, a large language model tailored to the smart energy domain, together with a comprehensive suite of resources to advance LLM research in this field. Specifically, we develop Enersys, a multi-agent collaborative framework for end-to-end dataset construction, through which we produce: (1) a smart energy knowledge base, EnerBase, to enrich the model's foundational expertise; (2) an instruction fine-tuning dataset, EnerInstruct, to strengthen performance on domain-specific downstream tasks; and (3) an RLHF dataset, EnerReinforce, to align the model with human preferences and industry standards. Leveraging these resources, Helios undergoes large-scale pretraining, SFT, and RLHF. We also release EnerBench, a benchmark for evaluating LLMs in smart energy scenarios, and demonstrate that our approach significantly enhances domain knowledge mastery, task execution accuracy, and alignment with human preferences.

翻译：在全球迈向碳中和的进程中，深度协同的智能能源系统是产业转型的基石。然而，该领域知识具有跨学科、碎片化且快速演进的特点，导致通用大语言模型因缺乏领域知识和物理约束意识，难以提供精准的、符合工程要求的推理与生成。为应对这些挑战，我们推出了Helios——一个专为智能能源领域定制的大语言模型，并配套发布了一套完整的资源以推动该领域的LLM研究。具体而言，我们开发了Enersys，一个用于端到端数据集构建的多智能体协同框架，并借此构建了：(1) 智能能源知识库EnerBase，以丰富模型的基础专业知识；(2) 指令微调数据集EnerInstruct，以增强模型在领域特定下游任务上的性能；(3) RLHF数据集EnerReinforce，以使模型与人类偏好及行业标准对齐。依托这些资源，Helios进行了大规模预训练、SFT和RLHF。我们还发布了EnerBench，一个用于评估智能能源场景下LLM性能的基准测试集，并证明我们的方法显著提升了模型在领域知识掌握、任务执行准确性以及与人类偏好对齐方面的能力。