NeuCrowd:用众包标签进行代表性学习的神经抽样网络 (NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels)

Representation learning approaches require a massive amount of discriminative training data, which is unavailable in many scenarios, such as healthcare, smart city, education, etc. In practice, people refer to crowdsourcing to get annotated labels. However, due to issues like data privacy, budget limitation, shortage of domain-specific annotators, the number of crowdsourced labels is still very limited. Moreover, because of annotators' diverse expertise, crowdsourced labels are often inconsistent. Thus, directly applying existing supervised representation learning (SRL) algorithms may easily get the overfitting problem and yield suboptimal solutions. In this paper, we propose \emph{NeuCrowd}, a unified framework for SRL from crowdsourced labels. The proposed framework (1) creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and (2) automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. The proposed framework is evaluated on both one synthetic and three real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage reproducible results, we make our code publicly available at \url{https://github.com/tal-ai/NeuCrowd_KAIS2021}.

翻译：代表制学习方法需要大量歧视性培训数据,这在许多情景中都无法获得,如医疗保健、智能城市、教育等。在实践中,人们参考众包以获得附加说明的标签。然而,由于数据隐私、预算限制、特定领域评分员短缺等问题,众包标签的数量仍然非常有限。此外,由于评分员的专长多种多样,众包标签往往不尽相同。因此,直接应用现有的监督代表性学习算法(SRL)可能很容易获得过分适合的问题,并产生不完美的解决方案。在本文中,我们提议为众包标签的SRL提供一个统一框架。拟议的框架(1) 利用安全觉察采样和稳健的锚生成,创建了足够数量的高质量\emph{n} 众包标签培训样本;以及(2) 自动学习一个神经采样网络,以便适应性地学习为SRL网络选择有效的样本。拟议框架在1个和3个真实世界数据中都进行了评估。我们提出的框架在1个合成和3个真实的_NeuC-rowd}中,鼓励公开的准确性定义。结果显示我们现有的基准范围。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【卡内基梅隆大学-CMU】机器学习中的公平性，Learning Fair Representations

专知会员服务

38+阅读 · 2020年2月29日