Large language models offer opportunities to simulate multi-party deliberation, but realistic modeling remains limited by a lack of speaker-attributed data. Transcripts produced via automatic speech recognition (ASR) assign anonymous speaker labels (e.g., Speaker_1), preventing models from capturing consistent human behavior. This work introduces a reproducible pipeline to transform public Zoom recordings into speaker-attributed transcripts with metadata like persona profiles and pragmatic action tags (e.g., [propose_motion]). We release three local government deliberation datasets: Appellate Court hearings, School Board meetings, and Municipal Council sessions. Fine-tuning LLMs to model specific participants using this "action-aware" data produces a 67% reduction in perplexity and nearly doubles classifier-based performance metrics for speaker fidelity and realism. Turing-style human evaluations show our simulations are often indistinguishable from real deliberations, providing a practical and scalable method for complex realistic civic simulations.
翻译:大语言模型为模拟多方审议提供了机遇,但现实建模仍因缺乏说话者归属数据而受限。通过自动语音识别生成的转录本仅分配匿名说话者标签(如Speaker_1),导致模型无法捕捉一致的人类行为。本研究提出一种可复现的流程,将公开的Zoom录像转化为带有说话者归属的转录本,并附加角色档案和语用动作标签(如[propose_motion])等元数据。我们发布了三个地方政府审议数据集:上诉法院听证会、学校董事会会议和市政议会会议。利用这种“动作感知”数据对大语言模型进行微调以模拟特定参与者,使困惑度降低67%,说话者忠实度和真实性的分类器性能指标提升近一倍。图灵式人工评估表明,我们的模拟结果常与真实审议难以区分,为复杂且真实的公民模拟提供了实用且可扩展的方法。