自我监督神经结构搜索不平衡数据集 (Self-Supervised Neural Architecture Search for Imbalanced Datasets)

Neural Architecture Search (NAS) provides state-of-the-art results when trained on well-curated datasets with annotated labels. However, annotating data or even having balanced number of samples can be a luxury for practitioners from different scientific fields, e.g., in the medical domain. To that end, we propose a NAS-based framework that bears the threefold contributions: (a) we focus on the self-supervised scenario, i.e., where no labels are required to determine the architecture, and (b) we assume the datasets are imbalanced, (c) we design each component to be able to run on a resource constrained setup, i.e., on a single GPU (e.g. Google Colab). Our components build on top of recent developments in self-supervised learning~\citep{zbontar2021barlow}, self-supervised NAS~\citep{kaplan2020self} and extend them for the case of imbalanced datasets. We conduct experiments on an (artificially) imbalanced version of CIFAR-10 and we demonstrate our proposed method outperforms standard neural networks, while using $27\times$ less parameters. To validate our assumption on a naturally imbalanced dataset, we also conduct experiments on ChestMNIST and COVID-19 X-ray. The results demonstrate how the proposed method can be used in imbalanced datasets, while it can be fully run on a single GPU. Code is available \href{https://github.com/TimofeevAlex/ssnas_imbalanced}{here}.

翻译：神经架构搜索(NAS) 提供最新的最新结果, 培训时要对精密的数据集进行培训, 并配有附加说明的标签。然而, 说明数据或甚至有均衡的样本数量对于不同科学领域, 例如医学领域的实践者来说是奢侈的。为此, 我们提出一个基于NAS的框架, 配有三重贡献:(a) 我们侧重于自我监督的假设情景, 即不需要标签来确定结构, (b) 我们假设数据集是不平衡的, (c) 我们设计每个组件, 能够在资源限制设置上运行, 也就是说, 是一个单一的 GUPU( 如 Google Colab) 。我们的构件建基于最近动态, 自我监督的学习 ⁇ ciep{zbontar2021barlow}, 自我监督的 NASquccisteep{katiplantal2020self}, 并扩展它们用于计算不平衡的数据集。 (abreferal) a nual- rudealalalalalal- rudealal a dislations) a listration listrations suplistrislations be slated slations be suplations) 。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

最新《神经架构搜索NAS》报告，附46页ppt与视频

专知会员服务

36+阅读 · 2020年12月30日

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日