Interactive data exploration (IDE) is an effective way of comprehending big data, whose volume and complexity are beyond human abilities. The main goal of IDE is to discover user interest regions from a database through multi-rounds of user labelling. Existing IDEs adopt active-learning framework, where users iteratively discriminate or label the interestingness of selected tuples. The process of data exploration can be viewed as the process of training a classifier, which determines whether a database tuple is interesting to a user. An efficient exploration thus takes very few iterations of user labelling to reach the data region of interest. In this work, we consider the data exploration as the process of few-shot learning, where the classifier is learned with only a few training examples, or exploration iterations. To this end, we propose a learning-to-explore framework, based on meta-learning, which learns how to learn a classifier with automatically generated meta-tasks, so that the exploration process can be much shortened. Extensive experiments on real datasets show that our proposal outperforms existing explore-by-example solutions in terms of accuracy and efficiency.
翻译:交互式数据探索(IDE)是理解大数据的有效方法,大数据的数量和复杂性超出了人的能力。IDE的主要目标是通过多轮用户标签从数据库中发现用户感兴趣的区域。现有的IDE采用主动学习框架,用户在其中反复区分或标注选定图例的有趣性。数据探索过程可被视为培训一个分类员的过程,该分类员决定数据库图对用户是否有趣。因此,高效的勘探需要很少用户标签的迭代,才能到达感兴趣的数据区域。在这项工作中,我们把数据探索视为少发学习的过程,在此过程中,分类员只学习几个培训实例,或勘探迭代。为此,我们提议一个基于元学习的学习到探索框架,学习如何用自动生成的元任务来学习分类员,这样可以大大缩短勘探过程。关于真实数据集的广泛实验表明,我们的提案在准确性和效率方面超越了现有的逐个探索的解决办法。