Learning with few labeled tabular samples is often an essential requirement for industrial machine learning applications as varieties of tabular data suffer from high annotation costs or have difficulties in collecting new samples for novel tasks. Despite the utter importance, such a problem is quite under-explored in the field of tabular learning, and existing few-shot learning schemes from other domains are not straightforward to apply, mainly due to the heterogeneous characteristics of tabular data. In this paper, we propose a simple yet effective framework for few-shot semi-supervised tabular learning, coined Self-generated Tasks from UNlabeled Tables (STUNT). Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label. We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks. Moreover, we introduce an unsupervised validation scheme for hyperparameter search (and early stopping) by generating a pseudo-validation set using STUNT from unlabeled data. Our experimental results demonstrate that our simple framework brings significant performance gain under various tabular few-shot learning benchmarks, compared to prior semi- and self-supervised baselines. Code is available at https://github.com/jaehyun513/STUNT.
翻译:使用少数标签的表格样本学习往往是工业机器学习应用的一项基本要求,因为表格数据品种有很高的注解成本,或者难以为新任务收集新的样本。尽管这个问题非常重要,但这个问题在表格学习领域探索得很少,而目前从其他领域略微发出的学习计划并非直接适用,主要是因为表格数据的多样性。在本文件中,我们提出了一个简单而有效的框架,用于从联合国标签表格(STUNT)中进行少发半监督的表格学习,从联合国标签表格(STUNT)中创建自创自制任务。我们的关键想法是通过随机选择的列作为目标标签来自我精选出各种少见的任务。我们随后采用元学习计划,学习所构建的任务的普通知识。此外,我们采用未经监督的验证计划,利用无标签数据STUNT来生成一个伪校准的套件。我们的实验结果表明,与之前的半和自我校准的基数基准相比,我们的简单框架在各种表格的少发学习基准下取得了显著的业绩收益。 http:// ST/ com/ http:// https/ http://</s>