Textbooks are the primary vehicle for delivering quality education to students. It has been shown that explanatory or illustrative visuals play a key role in the retention, comprehension and the general transfer of knowledge. However, many textbooks, especially in the developing world, are low quality and lack interesting visuals to support student learning. In this paper, we investigate the effectiveness of vision-language models to automatically enhance textbooks with images from the web. Specifically, we collect a dataset of e-textbooks from one of the largest free online publishers in the world. We rigorously analyse the dataset, and use the resulting analysis to motivate a task that involves retrieving and appropriately assigning web images to textbooks, which we frame as a novel optimization problem. Through a crowd-sourced evaluation, we verify that (1) while the original textbook images are rated higher, automatically assigned ones are not far behind, and (2) the choice of the optimization problem matters. We release the dataset of textbooks with an associated image bank to spur further research in this area.
翻译:教科书是向学生传授优质教育的主要工具。研究表明,解释性或插图对于知识的保留、理解和转移起着关键作用。但是,许多教科书,特别是在发展中国家,质量低劣并缺乏有趣的视觉支持学生学习。在本文中,我们调查了视觉-语言模型的有效性,以自动从网络获取的图像增强教科书。具体而言,我们收集了来自全球最大的免费在线出版商之一的电子教科书数据集。我们严格分析了数据集,并使用得到的分析结果激励一个涉及检索并适当分配网络图像给教科书的任务,我们将其作为一个新奇的优化问题。通过众包评估,我们验证了以下两点:(1) 虽然原始教科书图像得到了更高的评分,但自动分配的图像也不差;(2)优化问题的选择很重要。我们发布了带有相关图像库的教科书数据集,以推动这一领域的进一步研究。