面向表格数据分析的多模态对话智能体 (A Multimodal Conversational Agent for Tabular Data Analysis)

Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including voice interaction, while maintaining high performance. In this article, we present Talk2Data, a multimodal LLM-driven conversational agent for intuitive data exploration. The system lets users query datasets with voice or text instructions and receive answers as plots, tables, statistics, or spoken explanations. Built on LLMs, the suggested design combines OpenAI Whisper automatic speech recognition (ASR) system, Qwen-coder code generation LLM/model, custom sandboxed execution tools, and Coqui library for text-to-speech (TTS) within an agentic orchestration loop. Unlike text-only analysis tools, it adapts responses across modalities and supports multi-turn dialogues grounded in dataset context. In an evaluation of 48 tasks on three datasets, our prototype achieved 95.8% accuracy with model-only generation time under 1.7 seconds (excluding ASR and execution time). A comparison across five LLM sizes (1.5B-32B) revealed accuracy-latency-cost trade-offs, with a 7B model providing the best balance for interactive use. By routing between conversation with user and code execution, constrained to a transparent sandbox, with simultaneously grounding prompts in schema-level context, the Talk2Data agent reliably retrieves actionable insights from tables while making computations verifiable. In the article, except for the Talk2Data agent itself, we discuss implications for human-data interaction, trust in LLM-driven analytics, and future extensions toward large-scale multimodal assistants.

翻译：大语言模型（LLMs）能够通过处理数据分析、可视化和解释，在与用户（包括语音交互）的交互式、情境感知对话中重塑信息处理方式，同时保持高性能。本文提出Talk2Data，一种基于多模态LLM驱动的对话智能体，用于直观的数据探索。该系统允许用户通过语音或文本指令查询数据集，并以图表、表格、统计信息或语音解释的形式接收答案。该设计基于LLMs构建，结合了OpenAI Whisper自动语音识别（ASR）系统、Qwen-coder代码生成LLM/模型、自定义沙箱执行工具以及Coqui库的文本转语音（TTS）功能，形成一个智能体编排循环。与纯文本分析工具不同，它能够跨模态自适应响应，并支持基于数据集上下文的多轮对话。在三个数据集上的48项任务评估中，我们的原型系统实现了95.8%的准确率，模型生成时间低于1.7秒（不包括ASR和执行时间）。对五种LLM规模（1.5B-32B）的比较揭示了准确率-延迟-成本的权衡，其中7B模型在交互使用中提供了最佳平衡。通过在与用户的对话和代码执行之间进行路由（限制在透明的沙箱中），同时将提示基于模式级上下文，Talk2Data智能体能够可靠地从表格中提取可操作的见解，并使计算可验证。本文除了介绍Talk2Data智能体本身外，还讨论了其对人类-数据交互的影响、LLM驱动分析的可信度，以及未来向大规模多模态助手扩展的方向。