MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset - 专知论文

会员服务 ·

0

MoDELS · 数据集 · 情景 · state-of-the-art · Performer ·

2023 年 5 月 2 日

MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

翻译：暂无翻译

Tobias Brugger,Matthias Stürmer,Joel Niklaus

from arxiv, Accepted at ICAIL 2023

Sentence Boundary Detection (SBD) is one of the foundational building blocks of Natural Language Processing (NLP), with incorrectly split sentences heavily influencing the output quality of downstream tasks. It is a challenging task for algorithms, especially in the legal domain, considering the complex and different sentence structures used. In this work, we curated a diverse multilingual legal dataset consisting of over 130'000 annotated sentences in 6 languages. Our experimental results indicate that the performance of existing SBD models is subpar on multilingual legal data. We trained and tested monolingual and multilingual models based on CRF, BiLSTM-CRF, and transformers, demonstrating state-of-the-art performance. We also show that our multilingual models outperform all baselines in the zero-shot setting on a Portuguese test set. To encourage further research and development by the community, we have made our dataset, models, and code publicly available.

翻译：暂无翻译

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

基于碳纳米管/石墨烯杂化材料的一体式氧还原电极可控构筑

国家自然科学基金

0+阅读 · 2015年12月31日

金属化含能材料中金属自钝化及界面反应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ba0.9Co0.7Fe0.2Nb0.1O3-δ阴极耐侵蚀性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

系统分析NAD对氧化还原信号传导和新陈代谢的影响

国家自然科学基金

0+阅读 · 2012年12月31日

CFB富氧燃烧石灰石直接硫化多孔性产物层缺陷扩散研究

国家自然科学基金

0+阅读 · 2012年12月31日

由金属铋可控合成非计量Bi-O基纳米材料的半导体特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

高温菌GFX-5硝酸盐还原酶基因的克隆及其表达调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋真菌中Xyloketal衍生物的合成及对动脉粥样硬化发病氧化应激信号通路中NADPH氧化酶的抑制和分子机制探讨

国家自然科学基金

0+阅读 · 2011年12月31日

高性能NOx存储还原材料及结构与性能的研究

国家自然科学基金

0+阅读 · 2009年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

Evaluation of Speech Representations for MOS prediction

Arxiv

0+阅读 · 2023年6月16日

Clickbait Detection via Large Language Models

Arxiv

0+阅读 · 2023年6月16日

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

Arxiv

0+阅读 · 2023年6月15日

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Arxiv

0+阅读 · 2023年6月15日

Infrastructure Crack Segmentation: Boundary Guidance Method and Benchmark Dataset

Arxiv

0+阅读 · 2023年6月15日

Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca

Arxiv

0+阅读 · 2023年6月15日

BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年6月15日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Evaluation of Speech Representations for MOS prediction

Arxiv

0+阅读 · 2023年6月16日

Clickbait Detection via Large Language Models

Arxiv

0+阅读 · 2023年6月16日

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

Arxiv

0+阅读 · 2023年6月15日

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Arxiv

0+阅读 · 2023年6月15日

Infrastructure Crack Segmentation: Boundary Guidance Method and Benchmark Dataset

Arxiv

0+阅读 · 2023年6月15日

Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca

Arxiv

0+阅读 · 2023年6月15日

BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年6月15日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

相关基金

基于碳纳米管/石墨烯杂化材料的一体式氧还原电极可控构筑

国家自然科学基金

0+阅读 · 2015年12月31日

金属化含能材料中金属自钝化及界面反应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ba0.9Co0.7Fe0.2Nb0.1O3-δ阴极耐侵蚀性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

系统分析NAD对氧化还原信号传导和新陈代谢的影响

国家自然科学基金

0+阅读 · 2012年12月31日

CFB富氧燃烧石灰石直接硫化多孔性产物层缺陷扩散研究

国家自然科学基金

0+阅读 · 2012年12月31日

由金属铋可控合成非计量Bi-O基纳米材料的半导体特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

高温菌GFX-5硝酸盐还原酶基因的克隆及其表达调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋真菌中Xyloketal衍生物的合成及对动脉粥样硬化发病氧化应激信号通路中NADPH氧化酶的抑制和分子机制探讨

国家自然科学基金

0+阅读 · 2011年12月31日

高性能NOx存储还原材料及结构与性能的研究

国家自然科学基金

0+阅读 · 2009年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员