Creating large, good quality labeled data has become one of the major bottlenecks for developing machine learning applications. Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot learning, weak supervision) or to improve the efficiency of labeling process (active learning). Among those, Weak Supervision has been shown to reduce labeling costs by employing hand crafted labeling functions designed by domain experts. We propose AutoWS -- a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts. Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data. Noisy labels can then be aggregated into probabilistic labels used by a downstream discriminative classifier. Our framework is fully automatic and requires no hyper-parameter specification by users. We compare our approach with different state-of-the-art work on weak supervision and noisy training. Experimental results show that our method outperforms competitive baselines.
翻译:创建大量高质量的标签数据已成为开发机器学习应用程序的主要瓶颈之一;开发了多种技术,以减少标签数据的依赖性(零/光学学习,监管薄弱)或提高标签程序的效率(积极学习);其中,薄弱的监督显示使用域专家设计的手工制作标签功能可以降低标签成本;我们提议AutoWS -- -- 提高监管程序薄弱效率的新框架,同时减少对域专家的依赖。我们的方法要求每个标签类有少量标签示例,并自动创建一套标签功能,为许多无标签数据分配噪音标签。然后,将噪音标签归为下游歧视性分类师使用的概率标签。我们的框架是完全自动的,不需要用户的超参数规格。我们将我们的方法与不同的州级监管和噪音培训工作进行比较。实验结果显示,我们的方法超过了竞争性的基线。