R-Log：通过基于推理的强化学习激励大语言模型的日志分析能力 (R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning)

The growing complexity of log data in modern software systems has prompted the use of Large Language Models (LLMs) for automated log analysis. Current approaches typically rely on direct supervised fine-tuning (SFT) on log-label pairs. However, this exacerbates the domain discrepancy between general-purpose LLMs and specialized log data, causing overfitting. Furthermore, SFT's imbalanced loss computation often allows lengthy contexts to overwhelm critical, concise details in model answers, leading to hallucinations. To address these limitations, we propose R-Log, a novel reasoning-based paradigm that mirrors the structured, step-by-step analytical process of human engineers. This approach enhances generalizability by learning the underlying rules behind conclusions. We further employ Reinforcement Learning (RL) to optimize the model within a simulated O&M environment, thereby reducing hallucinations by directly rewarding correct outcomes. R-Log is first cold-started on a curated dataset of 2k+ reasoning trajectories, guided by 13 strategies from manual O&M practices, to establish an initial reasoning capability. This ability is then refined via RL using a joint reward function. Empirical evaluations on real-world logs show that R-Log outperforms existing methods across five log analysis tasks, particularly in unseen scenarios (by 228.05%). We also designed R-Log-fast with 5x speedup while keeping 93% of the efficacy.

翻译：现代软件系统中日益复杂的日志数据促使人们使用大语言模型（LLMs）进行自动化日志分析。当前方法通常依赖于对日志-标签对的直接监督微调（SFT）。然而，这加剧了通用大语言模型与专业日志数据之间的领域差异，导致过拟合。此外，SFT中不平衡的损失计算常常使得冗长的上下文淹没模型答案中关键而简洁的细节，从而导致幻觉。为了解决这些局限性，我们提出了R-Log，一种新颖的基于推理的范式，它模拟了人类工程师结构化、逐步的分析过程。该方法通过学习结论背后的潜在规则来增强泛化能力。我们进一步采用强化学习（RL）在模拟的运维环境中优化模型，从而通过直接奖励正确结果来减少幻觉。R-Log首先在一个包含2000多条推理轨迹的精选数据集上进行冷启动，该数据集由来自人工运维实践的13种策略指导，以建立初步的推理能力。随后，通过使用联合奖励函数的RL来精炼这种能力。在真实世界日志上的实证评估表明，R-Log在五项日志分析任务上均优于现有方法，尤其是在未见过的场景中（提升228.05%）。我们还设计了R-Log-fast，在保持93%效能的同时实现了5倍加速。