图像对象:用于检测对象的大型数据集 (ImageSubject: A Large-scale Dataset for Subject Detection)

Main subjects usually exist in the images or videos, as they are the objects that the photographer wants to highlight. Human viewers can easily identify them but algorithms often confuse them with other objects. Detecting the main subjects is an important technique to help machines understand the content of images and videos. We present a new dataset with the goal of training models to understand the layout of the objects and the context of the image then to find the main subjects among them. This is achieved in three aspects. By gathering images from movie shots created by directors with professional shooting skills, we collect the dataset with strong diversity, specifically, it contains 107\,700 images from 21\,540 movie shots. We labeled them with the bounding box labels for two classes: subject and non-subject foreground object. We present a detailed analysis of the dataset and compare the task with saliency detection and object detection. ImageSubject is the first dataset that tries to localize the subject in an image that the photographer wants to highlight. Moreover, we find the transformer-based detection model offers the best result among other popular model architectures. Finally, we discuss the potential applications and conclude with the importance of the dataset.

翻译：图像或视频中通常存在主要主题, 因为它们是摄影师想要突出显示的对象。人类观众可以很容易地辨别它们, 但算法往往将它们与其他对象混为一谈。检测主要主题是一项帮助机器理解图像和视频内容的重要技术。我们提出了一个新的数据集, 目的是培训模型, 以了解对象的布局和图像的背景, 然后找到其中的主要对象。这是在三个方面实现的。通过收集由具有专业射击技巧的导演制作的电影镜头中的图像, 我们收集的数据集非常多样, 特别是它包含21\ 540电影镜头中的107\ 700图像。我们用两个类别: 主题和非主题的前方对象的框标签给他们贴上标签。我们详细分析数据集, 并将任务与突出的检测和对象的检测进行比较。图像对象是第一个试图在摄影师想要突出的图像中将主题本地化的数据集。此外, 我们发现基于变压器的探测模型提供了最佳结果, 以及其他流行的模型结构。最后, 我们用数据集的重要性来讨论潜在应用和结论。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日