按 K 主概念对数据集的汇总 (Dataset Summarization by K Principal Concepts)

We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.

翻译：我们建议 K 的主要概念识别用于数据集 summarizariion 的新任务。目标是找到一组 K 概念概念概念, 以更好地解释数据集内的差异。概念是高层次的人类可解释的术语, 如“ tiger ”、“ kayaking ” 或“ hapy ” 。 K 概念是从一个( 可能长的) 候选人输入列表中选择的。我们表示概念库。概念库可以取自一个通用字典, 或者由特定任务之前的知识来构建。图像嵌入方法( 如 CLIP) 用于将图像和概念库映射成一个共享的特性空间。要选择最能解释数据的 K 概念, 我们将问题发展成 K 功能定位为 K 功能定位问题。高效的优化技术用于将本地搜索算法推广到非常大的概念库。我们方法的输出是一组 K 主要概念, 总结数据集。我们的方法提供了比较 K 比较 K 的更清晰的概要, 。作为我们方法的进一步应用, K 主要概念可以向 K 将数据分类。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日