This paper presents a framework for evaluating fairness in recommender systems powered by Large Language Models (RecLLMs), addressing the need for a unified approach that spans various fairness dimensions including sensitivity to user attributes, intrinsic fairness, and discussions of fairness based on underlying benefits. In addition, our framework introduces counterfactual evaluations and integrates diverse user group considerations to enhance the discourse on fairness evaluation for RecLLMs. Our key contributions include the development of a robust framework for fairness evaluation in LLM-based recommendations and a structured method to create \textit{informative user profiles} from demographic data, historical user preferences, and recent interactions. We argue that the latter is essential for enhancing personalization in such systems, especially in temporal-driven scenarios. We demonstrate the utility of our framework through practical applications on two datasets, LastFM-1K and ML-1M. We conduct experiments on a subsample of 80 users from each dataset, testing and assessing the effectiveness of various prompt construction scenarios and in-context learning, comprising more than 50 scenarios. This results in more than 4000 recommendations (80 * 50 = 4000). Our study reveals that while there are no significant unfairness issues in scenarios involving sensitive attributes, some concerns remain. However, in terms of intrinsic fairness, which does not involve direct sensitivity, unfairness across demographic groups remains significant. The code and data used for this paper are available at: \url{https://shorturl.at/awBFM}.
翻译:暂无翻译