Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tuning and perform poorly under domain shifts. We propose Caption--Prompt--Judge (CPJ), a training-free few-shot framework that enhances Agri-Pest VQA through structured, interpretable image captions. CPJ employs large vision-language models to generate multi-angle captions, refined iteratively via an LLM-as-Judge module, which then inform a dual-answer VQA process for both recognition and management responses. Evaluated on CDDMBench, CPJ significantly improves performance: using GPT-5-mini captions, GPT-5-Nano achieves \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. The framework provides transparent, evidence-based reasoning, advancing robust and explainable agricultural diagnosis without fine-tuning. Our code and data are publicly available at: https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis.
翻译:准确且可解释的作物病害诊断对农业决策至关重要,然而现有方法通常依赖成本高昂的监督微调,且在领域偏移下表现不佳。我们提出描述-提示-判断(CPJ)框架,这是一种无需训练的小样本方法,通过结构化、可解释的图像描述增强农业病虫害视觉问答性能。CPJ利用大规模视觉语言模型生成多角度图像描述,并通过大语言模型作为评判模块进行迭代优化,进而驱动包含识别与管理建议的双答案视觉问答流程。在CDDMBench基准上的评估表明,CPJ显著提升了性能:采用GPT-5-mini生成描述时,GPT-5-Nano在病害分类任务上较无描述基线提升\textbf{22.7}个百分点,在问答得分上提升\textbf{19.5}分。该框架提供透明且基于证据的推理机制,无需微调即可实现鲁棒且可解释的农业诊断。我们的代码与数据已公开于:https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis。