Unlike human reasoning in abstract conceptual spaces, large language models (LLMs) typically reason by generating discrete tokens, which potentially limit their expressive power. The recent work Soft Thinking has shown that LLMs' latent reasoning via soft concepts is a promising direction, but LLMs are trained on discrete tokens. To reduce this gap between the soft concepts in reasoning and the discrete tokens in training, we propose Soft Concept Mixing (SCM), a soft concept aware training scheme that directly exposes the model to soft representations during training. Specifically, SCM constructs a soft concept vector by forming a probability-weighted average of embeddings. Then, this vector is mixed into the model's hidden states, which embody rich contextual information. Finally, the entire latent reasoning process is optimized with Reinforcement Learning (RL). Experiments on five reasoning benchmarks demonstrate that SCM improves the reasoning performance of LLMs, and simultaneously maintains a stable training dynamic.
翻译:与人类在抽象概念空间中的推理不同,大语言模型(LLMs)通常通过生成离散的标记进行推理,这可能会限制其表达能力。近期工作Soft Thinking表明,LLMs通过软概念进行潜在推理是一个有前景的方向,但LLMs是在离散标记上训练的。为缩小推理中的软概念与训练中的离散标记之间的差距,我们提出软概念混合(SCM),这是一种软概念感知的训练方案,直接在训练过程中向模型暴露软表示。具体而言,SCM通过形成嵌入的概率加权平均来构建软概念向量。随后,该向量被混合到包含丰富上下文信息的模型隐藏状态中。最后,整个潜在推理过程通过强化学习(RL)进行优化。在五个推理基准测试上的实验表明,SCM提升了LLMs的推理性能,同时保持了稳定的训练动态。