Table reasoning involves transforming natural language questions into corresponding answers based on the provided data table. Recent research exploits large language models (LLMs) to facilitate table reasoning, which however struggle with pattern recognition and lack support for visual-based pattern exploration. To address these limitations, we propose VisTR, a framework that leverages visualizations as representations to facilitate data pattern recognition and support cross-modal exploration. We describe VisTR as a process consisting of four major modules: 1) visualization alignment that utilizes multimodal LLMs to align visualizations across various modalities, including chart, text, and sketch; 2) visualization referencing that decomposes a table into multifaceted visualization references that comprehensively represent the table; 3) visualization pruning that incorporates data and retrieval pruning to excise visualization references with poor information and enhance retrieval efficiency; and 4) visualization interaction that offers an interactive visual interface with multimodal interactions for user-friendly table reasoning. Quantitative evaluation with existing multimodal LLMs demonstrates the effectiveness of the alignment model in cross-modal visualization pairings. We further illustrate the applicability of the proposed framework in various time-series table reasoning and exploration tasks.
翻译:暂无翻译