可扩展不可验证奖励：视觉洞察案例研究 (Scaling Unverifiable Rewards: A Case Study on Visual Insights)

Large Language Model (LLM) agents can increasingly automate complex reasoning through Test-Time Scaling (TTS), iterative refinement guided by reward signals. However, many real-world tasks involve multi-stage pipeline whose final outcomes lack verifiable rewards or sufficient data to train robust reward models, making judge-based refinement prone to accumulate error over stages. We propose Selective TTS, a process-based refinement framework that scales inference across different stages in multi-agent pipeline, instead of repeated refinement over time by prior work. By distributing compute across stages and pruning low-quality branches early using process-specific judges, Selective TTS mitigates the judge drift and stabilizes refinement. Grounded in the data science pipeline, we build an end-to-end multi-agent pipeline for generating visually insightful charts and report of given dataset, and design a reliable LLM-based judge model, aligned with human experts (Kendall's τ=0.55). Our proposed selective TTS then improves insight quality under a fixed compute budget, increasing mean scores from 61.64 to 65.86 while reducing variance. We hope our findings serve as the first step toward to scaling complex, open-ended tasks with unverifiable rewards, such as scientific discovery and story generation.

翻译：大型语言模型（LLM）智能体通过测试时扩展（TTS）——即由奖励信号引导的迭代优化——正日益自动化复杂的推理过程。然而，许多现实世界任务涉及多阶段流程，其最终结果缺乏可验证的奖励或足够数据来训练稳健的奖励模型，导致基于评判的优化容易在多个阶段中累积误差。我们提出选择性TTS，一种基于过程的优化框架，它在多智能体流程的不同阶段间扩展推理，而非如先前工作那样随时间进行重复优化。通过将计算资源分配到各阶段，并利用过程特定的评判器早期剪枝低质量分支，选择性TTS减轻了评判漂移并稳定了优化过程。基于数据科学流程，我们构建了一个端到端多智能体流程，用于为给定数据集生成具有视觉洞察力的图表与报告，并设计了一个与人类专家对齐（Kendall's τ=0.55）的可靠基于LLM的评判模型。在固定计算预算下，我们提出的选择性TTS提升了洞察质量，将平均分数从61.64提高至65.86，同时降低了方差。我们希望我们的发现能为扩展具有不可验证奖励的复杂、开放式任务（如科学发现与故事生成）迈出第一步。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2月11日

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型

专知会员服务

18+阅读 · 2024年4月10日

【KDD2020】多任务多关系嵌入的Twitter意识形态检测，TIMME-Twitter Ideology-detection via Multi-task Multi-relational Embedding

专知会员服务

18+阅读 · 2020年6月8日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日