In recent years there has been a growing demand from financial agents, especially from particular and institutional investors, for companies to report on climate-related financial risks. A vast amount of information, in text format, can be expected to be disclosed in the short term by firms in order to identify these types of risks in their financial and non financial reports, particularly in response to the growing regulation that is being passed on the matter. To this end, this paper applies state-of-the-art NLP techniques to achieve the detection of climate change in text corpora. We use transfer learning to fine-tune two transformer models, BERT and ClimateBert -a recently published DistillRoBERTa-based model that has been specifically tailored for climate text classification-. These two algorithms are based on the transformer architecture which enables learning the contextual relationships between words in a text. We carry out the fine-tuning process of both models on the novel Clima-Text database, consisting of data collected from Wikipedia, 10K Files Reports and web-based claims. Our text classification model obtained from the ClimateBert fine-tuning process on ClimaText, outperforms the models created with BERT and the current state-of-the-art transformer in this particular problem. Our study is the first one to implement on the ClimaText database the recently published ClimateBert algorithm. Based on our results, it can be said that ClimateBert fine-tuned on ClimaText is an outstanding tool within the NLP pre-trained transformer models that may and should be used by investors, institutional agents and companies themselves to monitor the disclosure of climate risk in financial reports. In addition, our transfer learning methodology is cheap in computational terms, thus allowing any organization to perform it.
翻译:近年來,金融機構,尤其是個人和機構投資者,對企業披露有關氣候相關的金融風險的要求越來越高。可以預期,公司為了在財務和非財務報告中確定這些風險,會在短期內披露大量文本信息,尤其是針對越來越多的監管事項。因此,本研究應用最先進的自然语言技术实现了气候变化的文本分类。我们使用迁移学习来优化两种变换器模型,BERT和ClimateBert。ClimateBert是一种最近推出的DistillRoBERTa-based模型,专为气候文本分类而设计。这两个算法都基于变换器架构,可以学习文本中单词之间的上下文关系。我们在新的Clima-Text数据库上进行了两个模型的微调,该数据库由维基百科,10K文件报告和基于网络的索赔收集的数据组成。我们从ClimaText上对ClimateBert进行的微调过程中获得的文本分类模型优于使用BERT和当前在这个特定问题上的最先进变换器创建的模型。我们的研究是第一个在ClimaText数据库上实现最近发布的ClimateBert算法的研究。基于我们的结果,可以说ClimateBert在ClimaText微调后是NLP预训练变换器模型中的一种出色工具,投资者、机构代理人和公司自己可以利用它来监测财务报告中气候风险的披露。此外,我们的迁移学习方法在计算上是便宜的,因此任何组织都可以执行它。