Bias in AI systems can lead to unfair and discriminatory outcomes, especially when left untested before deployment. Although fairness testing aims to identify and mitigate such bias, existing tools are often difficult to use, requiring advanced expertise and offering limited support for real-world workflows. To address this, we introduce Bita, a conversational assistant designed to help software testers detect potential sources of bias, evaluate test plans through a fairness lens, and generate fairness-oriented exploratory testing charters. Bita integrates a large language model with retrieval-augmented generation, grounding its responses in curated fairness literature. Our validation demonstrates how Bita supports fairness testing tasks on real-world AI systems, providing structured, reproducible evidence of its utility. In summary, our work contributes a practical tool that operationalizes fairness testing in a way that is accessible, systematic, and directly applicable to industrial practice.
翻译:人工智能系统中的偏见可能导致不公平和歧视性结果,尤其在部署前未经测试时。尽管公平性测试旨在识别和缓解此类偏见,但现有工具通常难以使用,需要高级专业知识,且对实际工作流程的支持有限。为此,我们引入了Bita,这是一个对话助手,旨在帮助软件测试人员检测潜在的偏见来源、通过公平性视角评估测试计划,并生成面向公平性的探索性测试章程。Bita将大型语言模型与检索增强生成技术相结合,使其响应基于精选的公平性文献。我们的验证展示了Bita如何支持对实际人工智能系统进行公平性测试任务,提供结构化、可复现的效用证据。总之,我们的工作贡献了一个实用工具,以可访问、系统化且直接适用于工业实践的方式实现了公平性测试的操作化。