We present ImageReward -- the first general-purpose text-to-image human preference reward model -- to address various prevalent issues in generative models and align them with human values and preferences. Its training is based on our systematic annotation pipeline that covers both the rating and ranking components, collecting a dataset of 137k expert comparisons to date. In human evaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by 38.6\%), making it a promising automatic metric for evaluating and improving text-to-image synthesis. The reward model is publicly available via the \texttt{image-reward} package at \url{https://github.com/THUDM/ImageReward}.
翻译:----
我们提出了ImageReward——第一个通用的文本到图像人类偏好奖励模型,用于解决生成模型中的各种普遍问题,并将它们与人类价值和偏好对齐。它的训练基于我们的系统注释流程,包括评分和排名组件,迄今已收集了13.7万个专家比较的数据集。在人类评估中,ImageReward优于现有的得分方法(例如CLIP,提高了38.6\%),使其成为评估和改进文本到图像合成的有希望的自动度量衡。该奖励模型通过\texttt{image-reward}软件包公开发布,网址为\url{https://github.com/THUDM/ImageReward}。