FooDI-ML:食品、饮料和杂货图像和描述的大型多语种数据集 (FooDI-ML: a large multi-language dataset of food, drinks and groceries images and descriptions)

In this paper we introduce the FooDI-ML dataset. This dataset contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections gathered from the Glovo application. The data made available corresponds to food, drinks and groceries products from 37 countries in Europe, the Middle East, Africa and Latin America. The dataset comprehends 33 languages, including 870K samples of languages of countries from Eastern Europe and Western Asia such as Ukrainian and Kazakh, which have been so far underrepresented in publicly available visio-linguistic datasets. The dataset also includes widely spoken languages such as Spanish and English. To assist further research, we include benchmarks over two tasks: text-image retrieval and conditional image generation.

翻译：在本文中,我们介绍FooDI-ML数据集,该数据集包含1.5M以上的独特图像和9.5M以上的储存名称、产品名称说明和从Glovo应用中收集的收集部分,所提供的数据相当于来自欧洲、中东、非洲和拉丁美洲37个国家的食品、饮料和杂货产品,该数据集包含33种语言,包括东欧和西亚国家语言的870K样本,如乌克兰和哈萨克语,它们在公开提供的语言数据集中的代表性迄今一直不足。该数据集还包括西班牙语和英语等广泛使用的语言。为了协助进一步研究,我们列入了两项任务的基准:文字图像检索和有条件图像生成。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日