In this paper we introduce the FooDI-ML dataset. This dataset contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections gathered from the Glovo application. The data made available corresponds to food, drinks and groceries products from 37 countries in Europe, the Middle East, Africa and Latin America. The dataset comprehends 33 languages, including 870K samples of languages of countries from Eastern Europe and Western Asia such as Ukrainian and Kazakh, which have been so far underrepresented in publicly available visio-linguistic datasets. The dataset also includes widely spoken languages such as Spanish and English. To assist further research, we include benchmarks over two tasks: text-image retrieval and conditional image generation.
翻译:在本文中,我们介绍FooDI-ML数据集,该数据集包含1.5M以上的独特图像和9.5M以上的储存名称、产品名称说明和从Glovo应用中收集的收集部分,所提供的数据相当于来自欧洲、中东、非洲和拉丁美洲37个国家的食品、饮料和杂货产品,该数据集包含33种语言,包括东欧和西亚国家语言的870K样本,如乌克兰和哈萨克语,它们在公开提供的语言数据集中的代表性迄今一直不足。该数据集还包括西班牙语和英语等广泛使用的语言。为了协助进一步研究,我们列入了两项任务的基准:文字图像检索和有条件图像生成。