基于大语言模型的流程促进可解释人工智能 (LLM Driven Processes to Foster Explainable AI)

We present a modular, explainable LLM-agent pipeline for decision support that externalizes reasoning into auditable artifacts. The system instantiates three frameworks: Vester's Sensitivity Model (factor set, signed impact matrix, systemic roles, feedback loops); normal-form games (strategies, payoff matrix, equilibria); and sequential games (role-conditioned agents, tree construction, backward induction), with swappable modules at every step. LLM components (default: GPT-5) are paired with deterministic analyzers for equilibria and matrix-based role classification, yielding traceable intermediates rather than opaque outputs. In a real-world logistics case (100 runs), mean factor alignment with a human baseline was 55.5\% over 26 factors and 62.9\% on the transport-core subset; role agreement over matches was 57\%. An LLM judge using an eight-criterion rubric (max 100) scored runs on par with a reconstructed human baseline. Configurable LLM pipelines can thus mimic expert workflows with transparent, inspectable steps.

翻译：我们提出了一种模块化、可解释的大语言模型智能体决策支持流程，该流程将推理过程外部化为可审计的中间产物。该系统实例化了三个框架：Vester敏感性模型（因子集、带符号影响矩阵、系统角色、反馈回路）；标准形式博弈（策略、收益矩阵、均衡）；以及序贯博弈（角色条件智能体、树构建、逆向归纳法），每个步骤均采用可替换模块。大语言模型组件（默认使用GPT-5）与确定性分析器相结合，用于计算均衡和基于矩阵的角色分类，从而生成可追溯的中间结果而非不透明的输出。在真实物流场景的百次运行测试中，26个因子的平均对齐度达到人类基准的55.5%，运输核心子集的对齐度为62.9%；匹配案例中的角色一致性为57%。采用八项标准评分表（满分100）的大语言模型评审器给出的评分与重构的人类基准相当。研究表明，可配置的大语言模型流程能够通过透明、可检查的步骤模拟专家工作流程。