在文本分类中应用活跃的查询 K- 方法 (The Application of Active Query K-Means in Text Classification)

from arxiv, 6 pages, 3 algorithms, 4 tables, 8 figures For source code and questions, please email Yukun Jiang at jy2363@nyu.edu Reply would follow shortly

Active learning is a state-of-art machine learning approach to deal with an abundance of unlabeled data. In the field of Natural Language Processing, typically it is costly and time-consuming to have all the data annotated. This inefficiency inspires out our application of active learning in text classification. Traditional unsupervised k-means clustering is first modified into a semi-supervised version in this research. Then, a novel attempt is applied to further extend the algorithm into active learning scenario with Penalized Min-Max-selection, so as to make limited queries that yield more stable initial centroids. This method utilizes both the interactive query results from users and the underlying distance representation. After tested on a Chinese news dataset, it shows a consistent increase in accuracy while lowering the cost in training.

翻译：积极学习是一种最先进的机器学习方法,用来处理大量未标数据。在自然语言处理领域,通常要求所有数据附加说明费用昂贵且耗时。这种效率低下促使我们在文本分类中应用积极学习。传统的不受监督的 k 手段组合首先被修改为本研究中半监督的版本。然后,采用了一种新颖的尝试,将算法进一步扩展为惩罚性Min-Max-sselective的积极学习方案,以便进行有限的查询,从而产生更稳定的初始小行星。这种方法既利用用户的互动查询结果,又利用基本的远程代表。在中国新闻数据集测试后,它显示在降低培训成本的同时,其准确性持续提高。

相关内容

主动学习

关注 240

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日