Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.
翻译:确保Web可访问性对于推动数字空间的社会福利、公正与平等至关重要,然而绝大多数网站用户界面仍不符合标准,部分原因在于当前审计实践的资源密集性与不可扩展性。尽管WCAG-EM为站点级合规性评估提供了结构化方法,但其需要大量人力投入,且缺乏大规模执行的实际支持。本研究提出一个审计框架AAA,通过人机协作模式将WCAG-EM操作化。AAA的核心创新包括:GRASP——一种基于图的多模态抽样方法,通过视觉、文本和关系线索的嵌入学习确保代表性页面覆盖;以及MaC——一个基于多模态大语言模型的协同助手,通过跨模态推理和高复杂度任务的智能辅助支持审计人员。这些组件共同实现了可扩展的端到端Web可访问性审计,以AI增强的辅助能力赋能人类审计人员,产生实际影响。我们进一步贡献了四个专为审计流程核心阶段基准测试设计的新数据集。大量实验证明了我们方法的有效性,揭示了经过微调的小规模语言模型能够作为胜任的专家系统。