Deep research agents, which synthesize information across diverse sources, are significantly constrained by their sequential reasoning processes. This architectural bottleneck results in high latency, poor runtime adaptability, and inefficient resource allocation, making them impractical for interactive applications. To overcome this, we introduce FlashResearch, a novel framework for efficient deep research that transforms sequential processing into parallel, runtime orchestration by dynamically decomposing complex queries into tree-structured sub-tasks. Our core contributions are threefold: (1) an adaptive planner that dynamically allocates computational resources by determining research breadth and depth based on query complexity; (2) a real-time orchestration layer that monitors research progress and prunes redundant paths to reallocate resources and optimize efficiency; and (3) a multi-dimensional parallelization framework that enables concurrency across both research breadth and depth. Experiments show that FlashResearch consistently improves final report quality within fixed time budgets, and can deliver up to a 5x speedup while maintaining comparable quality.
翻译:深度研究智能体通过整合多源信息进行综合研究,但其顺序推理过程存在显著限制。这种架构瓶颈导致高延迟、运行时适应性差以及资源分配效率低下,使其难以应用于交互式场景。为克服这些问题,我们提出FlashResearch——一种创新的高效深度研究框架,通过将复杂查询动态分解为树状结构子任务,将顺序处理转变为并行化的运行时编排。我们的核心贡献包括:(1)自适应规划器:根据查询复杂度动态确定研究广度与深度,实现计算资源的动态分配;(2)实时编排层:监控研究进程并剪枝冗余路径,以重新分配资源并优化效率;(3)多维并行化框架:支持在研究广度与深度上实现并发执行。实验表明,FlashResearch在固定时间预算内持续提升最终报告质量,并在保持相当质量的同时实现高达5倍的加速比。