Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based interaction with NLP models. Through IFAN's interface, users can provide feedback to selected model explanations, which is then integrated through adapter layers to align the model with human rationale. We show the system to be effective in debiasing a hate speech classifier with minimal performance loss. IFAN also offers a visual admin system and API to manage models (and datasets) as well as control access rights. A demo is live at https://ifan.ml/
翻译:解释性和人文监督是将复杂的NLP模型应用于现实世界应用的基本支柱。然而,应用解释性和人文化方法需要技术熟练。尽管现有的模型理解和分析工具包,但整合人类反馈的备选方案仍然有限。我们提议IFAN作为与NLP模型进行实时解释性互动的框架。通过IFAN的接口,用户可以向选定的模型解释提供反馈,然后通过适应层将模型与人性原理结合起来。我们显示该系统在贬低仇恨言论分类员和最低性能损失方面是有效的。IFAN还提供视觉管理系统和API,以管理模型(和数据集)以及控制访问权。演示在https://ifan.ml/上实况转播。</s>