Audiobook interpretations are attracting increasing attention, as they provide accessible and in-depth analyses of books that offer readers practical insights and intellectual inspiration. However, their manual creation process remains time-consuming and resource-intensive. To address this challenge, we propose AI4Reading, a multi-agent collaboration system leveraging large language models (LLMs) and speech synthesis technology to generate podcast, like audiobook interpretations. The system is designed to meet three key objectives: accurate content preservation, enhanced comprehensibility, and a logical narrative structure. To achieve these goals, we develop a framework composed of 11 specialized agents,including topic analysts, case analysts, editors, a narrator, and proofreaders that work in concert to explore themes, extract real world cases, refine content organization, and synthesize natural spoken language. By comparing expert interpretations with our system's output, the results show that although AI4Reading still has a gap in speech generation quality, the generated interpretative scripts are simpler and more accurate.
翻译:有声书解读因其为读者提供实用见解与思想启迪,且能对书籍进行易于获取的深度分析,正受到越来越多的关注。然而,其人工创作过程仍然耗时且资源密集。为应对这一挑战,我们提出了AI4Reading,一个利用大语言模型和语音合成技术来生成播客式有声书解读的多智能体协作系统。该系统旨在实现三个关键目标:准确的内容保留、增强的可理解性以及逻辑性的叙事结构。为实现这些目标,我们开发了一个由11个专门化智能体组成的框架,包括主题分析师、案例分析师、编辑、叙述员和校对员,这些智能体协同工作以探索主题、提取现实案例、优化内容组织并合成自然的语音。通过将专家解读与本系统输出进行比较,结果表明,尽管AI4Reading在语音生成质量上仍存在差距,但其生成的解读脚本更为简洁和准确。