We present ImageReward -- the first general-purpose text-to-image human preference reward model -- to address various prevalent issues in generative models and align them with human values and preferences. Its training is based on our systematic annotation pipeline that covers both the rating and ranking components, collecting a dataset of 137k expert comparisons to date. In human evaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by 38.6\%), making it a promising automatic metric for evaluating and improving text-to-image synthesis. The reward model is publicly available via the \texttt{image-reward} package at \url{https://github.com/THUDM/ImageReward}.
翻译:摘要:我们提出了ImageReward——第一个通用的文本到图像人类偏好奖励模型,以解决生成模型中存在的各种问题,并将它们与人类的价值和偏好对齐。其训练基于我们的系统注释流程,涵盖评分和排名组件,迄今收集了137k个专家比较数据集。在人类评估中,ImageReward相较于现有的评分方法(例如CLIP)表现更优(提高了38.6\%),这使其成为评估和改进文本到图像合成的有希望的自动度量衡。该奖励模型可通过位于\url{https://github.com/THUDM/ImageReward}的\texttt{image-reward}软件包公开获得。