In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data. Extensive experiments are conducted to evaluate our method and demonstrate its effectiveness against 3 competitors on 7 diverse public classification datasets.
翻译:在新分类发现(NCD)中,目标是在未贴标签的数据集中找到新类,给一组已知但不同的类别贴上标签。虽然NCD最近得到了社区的注意,但还没有为多式表格数据提出框架,尽管它是数据非常常见的表示方式。在本文中,我们提出TabolarNCD,这是在列表数据中发现新类的新方法。我们展示了一种方法,从已知的分类中提取知识,以指导包含变量的表格数据中新类的发现过程。这一过程的一部分是通过界定假标签的新方法完成的,我们跟踪多式塔斯克学习的最新发现,以优化联合目标功能。我们的方法表明NCD不仅适用于图像,也适用于多式表格数据。我们进行了广泛的实验,以评价我们的方法,并展示其在7个不同的公共分类数据集上对3个竞争者的有效性。