While olfaction is central to how animals perceive the world, this rich chemical sensory modality remains largely inaccessible to machines. One key bottleneck is the lack of diverse, multimodal olfactory training data collected in natural settings. We present New York Smells, a large dataset of paired image and olfactory signals captured ``in the wild.'' Our dataset contains 7,000 smell-image pairs from 3,500 distinct objects across indoor and outdoor environments, with approximately 70$\times$ more objects than existing olfactory datasets. Our benchmark has three tasks: cross-modal smell-to-image retrieval, recognizing scenes, objects, and materials from smell alone, and fine-grained discrimination between grass species. Through experiments on our dataset, we find that visual data enables cross-modal olfactory representation learning, and that our learned olfactory representations outperform widely-used hand-crafted features.
翻译:尽管嗅觉是动物感知世界的核心方式,但这种丰富的化学感官模态在很大程度上仍难以被机器所获取。一个关键瓶颈在于缺乏在自然环境中收集的多样化、多模态嗅觉训练数据。我们提出了纽约气味数据集,这是一个在“野外”捕获的配对图像与嗅觉信号的大型数据集。我们的数据集包含来自室内外环境中3,500个不同物体的7,000个气味-图像对,物体数量约为现有嗅觉数据集的70倍。我们的基准测试包含三项任务:跨模态气味到图像检索、仅凭气味识别场景、物体和材料,以及草种间的细粒度区分。通过对我们数据集的实验,我们发现视觉数据能够促进跨模态嗅觉表征学习,并且我们学习到的嗅觉表征优于广泛使用的手工特征。