了解公司披露差距:模拟非随机碳数据公司失踪的模型 (Navigating the corporate disclosure gap: Modelling of Missing Not at Random Carbon Data)

from arxiv, This paper was selected for oral presentation at the Global Research Alliance for Sustainable Finance and Investment (GRASFI) Conference in Beijing, China (2021). The paper was also submitted for publication at Energy Economics and is currently under review

Corporate carbon emissions data is disclosed by approximately 65% of large and mid-sized companies globally, despite being a key indicator of corporate climate performance. With investors increasingly looking to integrate climate risk into their investment strategies and risk reporting, this creates demand for robust prediction models that can generate reliable estimates for missing carbon disclosures. However, these estimates lack transparency and are frequently used in the investment decisions process with the same confidence as corporate reported data. As disclosures remain mostly voluntary and the propensity to disclose is shaped by several factors (e.g. size, sector, geography), missing emissions data should be assumed to be missing not at random (MNAR). However, widely used estimation methods (e.g. linear regression models) typically do not correct for MNAR bias and do not accurately reflect the uncertainty of estimated data. The objective of this paper is to address these issues: (1) account for the uncertainty of the missing data and thus obtain regression coefficients by multiple imputation (MI) (2) correct for potential bias by using MI algorithms based on Heckman's sample selection model introduced by Galimard et al. (3) estimate missing carbon disclosures with linear models based on MI and report on the uncertainty of predicted values, measured as the length of the prediction interval. In the simulation, our approach resulted in an accuracy gain based on root mean squared error of up to 30%, and up to a 40% higher coverage rate than the existing models. When applied to commercial data, the results suggested up to 20% higher coverage for proposed methods.

翻译：全球约65%的大型和中型公司披露公司碳排放数据,尽管这是公司气候绩效的一个关键指标。随着投资者越来越多地寻求将气候风险纳入其投资战略和风险报告,这就要求制定可靠的预测模型,以得出碳披露缺失的可靠估计数。然而,这些估计数缺乏透明度,并经常在投资决策过程中与公司报告的数据一样信任地在投资决策过程中使用。由于披露大多是自愿的,而且披露的倾向受若干因素(例如规模、部门、地理)的影响,应当假定缺失的排放数据并非随机缺失(海运部)。然而,广泛使用的估算方法(如线性回归模型)通常不能纠正海运部的偏向性,不能准确反映估计数据的不确定性。然而,本文件的目的是解决这些问题:(1) 说明缺失数据的不确定性,从而通过多重估算获得回归系数(MI)(2) 通过使用基于盖曼抽样选择模型的MI算法,纠正潜在的偏差(如Galimard等人采用)。(3) 以基于IMI的拟议线性模型(如线性回归模型)为缺碳披露数据估计数,而不是基于IMI和关于预测准确度的30个预测率的模型,测量了40的准确度。根据预测结果的模型,为30度,根据预测的模型,测量了40的准确度,根据测度,根据预测结果测测测测为了40的模型,测为了40。