Text data mining is the process of deriving essential information from language text. Typical text mining tasks include text categorization, text clustering, topic modeling, information extraction, and text summarization. Various data sets are collected and various algorithms are designed for the different types of tasks. In this paper, I present a blue sky idea that very large language model (VLLM) will become an effective unified methodology of text mining. I discuss at least three advantages of this new methodology against conventional methods. Finally I discuss the challenges in the design and development of VLLM techniques for text mining.
翻译:典型的文字采矿任务包括文本分类、文本集群、专题建模、信息提取和文本汇总。收集了各种数据集,为不同类型的任务设计了各种算法。在本文中,我提出了一个蓝天的想法,即非常大的语文模式(VLLM)将成为一种有效的统一文本采矿方法。我讨论了这一新方法与传统方法至少三个优点。最后,我讨论了设计和开发文本采矿VLLM技术方面的挑战。