In the global drive toward carbon neutrality, deeply coordinated smart energy systems underpin industrial transformation. However, the interdisciplinary, fragmented, and fast-evolving expertise in this domain prevents general-purpose LLMs, which lack domain knowledge and physical-constraint awareness, from delivering precise engineering-aligned inference and generation. To address these challenges, we introduce Helios, a large language model tailored to the smart energy domain, together with a comprehensive suite of resources to advance LLM research in this field. Specifically, we develop Enersys, a multi-agent collaborative framework for end-to-end dataset construction, through which we produce: (1) a smart energy knowledge base, EnerBase, to enrich the model's foundational expertise; (2) an instruction fine-tuning dataset, EnerInstruct, to strengthen performance on domain-specific downstream tasks; and (3) an RLHF dataset, EnerReinforce, to align the model with human preferences and industry standards. Leveraging these resources, Helios undergoes large-scale pretraining, SFT, and RLHF. We also release EnerBench, a benchmark for evaluating LLMs in smart energy scenarios, and demonstrate that our approach significantly enhances domain knowledge mastery, task execution accuracy, and alignment with human preferences.
翻译:在全球迈向碳中和的进程中,深度协同的智能能源系统是产业转型的基石。然而,该领域知识具有跨学科、碎片化且快速演进的特点,导致通用大语言模型因缺乏领域知识和物理约束意识,难以提供精准的、符合工程要求的推理与生成。为应对这些挑战,我们推出了Helios——一个专为智能能源领域定制的大语言模型,并配套发布了一套完整的资源以推动该领域的LLM研究。具体而言,我们开发了Enersys,一个用于端到端数据集构建的多智能体协同框架,并借此构建了:(1) 智能能源知识库EnerBase,以丰富模型的基础专业知识;(2) 指令微调数据集EnerInstruct,以增强模型在领域特定下游任务上的性能;(3) RLHF数据集EnerReinforce,以使模型与人类偏好及行业标准对齐。依托这些资源,Helios进行了大规模预训练、SFT和RLHF。我们还发布了EnerBench,一个用于评估智能能源场景下LLM性能的基准测试集,并证明我们的方法显著提升了模型在领域知识掌握、任务执行准确性以及与人类偏好对齐方面的能力。