Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.
翻译:许多国家劳工政策分类任务,如性别主义/种族主义检测或毒性检测,都是基于人类价值观的。然而,人类价值观在不同的文化条件下可能有所不同。因此,我们引入了价值调整分类框架,在命令中根据明确书面的人类价值观进行预测。与任务一起,我们提出了从大型语言模型(LLMS)中提取价值调整知识的实用方法,以分两步构建价值调整分类者。首先,我们通过快速的微小的学习,从LLMS中生成价值调整的培训数据。接下来,我们用为任务生成的数据微小的分类模型进行微调。 经验性结果显示,我们的VA-Models在F1核心上超过多个基线至少15.56%,包括与OTP175B和现有的文本增强方法进行几发式学习。我们建议,使用具有明确人类价值投入的分类方法,可以提高在AI中的包容性和可解释性。