Quality of Experience~(QoE)-driven adaptive bitrate~(ABR) algorithms are typically optimized using QoE models that are based on the mean opinion score~(MOS), while such principles may not account for user heterogeneity on rating scales, resulting in unexpected behaviors. In this paper, we propose \texttt{Jade}, which leverages reinforcement learning with human feedback~(RLHF) technologies to better align the users' opinion scores. \texttt{Jade}'s rank-based QoE model considers relative values of user ratings to interpret the subjective perception of video sessions. We implement linear-based and Deep Neural Network (DNN)-based architectures for satisfying both accuracy and generalization ability. We further propose entropy-aware reinforced mechanisms for training policies with the integration of the proposed QoE models. Experimental results demonstrate that \texttt{Jade} performs favorably on conventional metrics, such as quality and stall ratio, and improves QoE by 8.09\%-38.13\% in different network conditions, emphasizing the importance of user heterogeneity in QoE modeling and the potential of combining linear-based and DNN-based models for performance improvement.
翻译:暂无翻译