Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic evaluation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more human-like stories than SOTA systems.
翻译:尽管在视觉字幕方面已经取得了令人印象深刻的成果,但从照片流中产生抽象故事的任务仍然是一个小问题。 与字幕不同,故事有更显眼的语言风格,包含许多没有出现在图像中的想象概念。因此,它给行为克隆算法带来了挑战。 此外,由于在评估故事质量方面自动衡量标准的限制,用手工制作的奖励加强学习方法在获得总体性能提升方面也面临困难。 因此,我们提议了一个反向回报学习(AREL)框架,以学习人类演示的隐性奖赏功能,然后用学习的奖赏功能优化政策搜索。 尽管自动评估表明克隆专家行为中最先进(SOTA)方法的性能稍有提高,但人类评估表明,我们的方法在产生比SOTA系统更像人类的故事方面取得了显著的改进。