使用自动反洗钱工具的文本分类方法代表模式评价 (Evaluation of Representation Models for Text Classification with AutoML Tools)

Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.

翻译：近年来,自动机学习(自动机学习)在表格数据方面取得了越来越多的成功,然而,处理像文本这样的无结构数据是一项挑战,没有开放源码自动机学习工具的广泛支持。这项工作比较了自动机学习工具自动生成的三个人工创建的文本表达和文本嵌入。我们的基准包括四个受欢迎的开放源码自动学习工具和八个数据集,用于文本分类。结果显示,直接文本表达比自动嵌入文本的工具要好,自动创建文本嵌入。

相关内容

TOOLS

关注 0

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日