The development of annotated datasets over the 21st century has helped us truly realize the power of deep learning. Most of the datasets created for the named-entity-recognition (NER) task are not domain specific. Finance domain presents specific challenges to the NER task and a domain specific dataset would help push the boundaries of finance research. In our work, we develop the first high-quality NER dataset for the finance domain. To set the benchmark for the dataset, we develop and test a weak-supervision-based framework for the NER task. We extend the current weak-supervision framework to make it employable for span-level classification. Our weak-ner framework and the dataset are publicly available on GitHub and Hugging Face.
翻译:21世纪附加说明的数据集的开发有助于我们真正实现深层次学习的力量。为命名实体识别(NER)任务创建的数据集大多不是具体领域。金融领域对NER任务提出了具体挑战,一个特定领域数据集将有助于推动金融研究的界限。我们在工作中为金融领域开发了第一个高质量的NER数据集。为设定数据集基准,我们为NER任务开发和测试一个薄弱的基于监督的架构。我们扩展了目前的弱监督框架,使之可用于跨层次分类。我们的弱者框架和数据集在GitHub和Hugging Face上公开提供。