利用机械学习方法在法律条文中自动检测工业部门</s> (Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches)

from arxiv, 26 pages, 5 figures, 3 tables. Paper was presented at 'Classification and Data Science in the Digital Age', 17th conference of the International Federation of Classification Societies (IFCS2022), Porto, Portugal, https://ifcs2022.fep.up.pt/

The ability to automatically identify industry sector coverage in articles on legal developments, or any kind of news articles for that matter, can bring plentiful of benefits both to the readers and the content creators themselves. By having articles tagged based on industry coverage, readers from all around the world would be able to get to legal news that are specific to their region and professional industry. Simultaneously, writers would benefit from understanding which industries potentially lack coverage or which industries readers are currently mostly interested in and thus, they would focus their writing efforts towards more inclusive and relevant legal news coverage. In this paper, a Machine Learning-powered industry analysis approach which combined Natural Language Processing (NLP) with Statistical and Machine Learning (ML) techniques was investigated. A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors. Text and legal based features were extracted from the text. Both traditional ML methods (e.g. gradient boosting machine algorithms, and decision-tree based algorithms) and deep neural network (e.g. transformer models) were applied for performance comparison of predictive models. The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors. The experimental results show that the suggested automated industry analysis which employs ML techniques allows the processing of large collections of text data in an easy, efficient, and scalable way. Traditional ML methods perform better than deep neural networks when only a small and domain-specific training data is available for the study.

翻译：在法律发展的文章中自动确定工业部门覆盖面的能力,或任何类型的有关法律发展的文章中自动确定工业部门覆盖面的能力,可以给读者和内容创作者本身带来大量好处。通过根据行业覆盖面给文章贴上标签,世界各地的读者将能够获得与其区域和专业行业具体相关的法律新闻。与此同时,作者将得益于了解哪些行业可能缺乏覆盖面,或者哪个行业读者目前最感兴趣的是哪个行业,因此,他们将把写作努力的重点放在更具包容性和相关的法律新闻报道上。在本文中,机器学习动力行业分析方法将自然语言处理(NLP)与统计和机器学习(ML)技术结合起来,可以给读者带来大量好处。调查了一套由1 700多篇附加说明的法律文章组成的数据集,用于识别六个行业,从文本中提取了基于法规的特征。传统ML方法(例如梯度推动机器算法和基于决策树木的算法)和深度神经网络(例如变压模型)都用于对预测模型进行业绩比较。在接收者网络下取得了很有希望的结果,在0.80以上可操作的磁标度曲线分析中,而磁标的大规模数据采集分析则是显示0.80的磁标的行业的大规模数据采集的系统,显示了0.8和磁标的系统显示显示的磁标的大型分析。</s>

相关内容

Machine Learning

关注 2242

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日