Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.
翻译:金融业和保险业的建模核心在于金融业和保险业的广泛任务。机器学习和深层次学习模型的兴起和发展创造了许多机会来改进我们的建模工具箱。这些领域的突破往往伴随着大量数据的要求。这些庞大的数据集往往在金融和保险领域无法公开提供,这主要是因为隐私和伦理问题。缺乏数据是目前开发更好的模型的主要障碍之一。缓解这一问题的一个可能选择是基因化模型。生成模型能够模拟假冒的、但现实的、可以更自由分享的合成数据。合成数据也被称为合成数据。显明的Adversarial网络(GANs)是一个模型,可以提高我们适应数据高度分布的能力。虽然对GANs的研究在计算机愿景等领域是一个活跃的话题,但在人类科学、经济学和保险模式中却发现很少采用。 原因在于,在这些领域,大部分问题都在于查明因果关系,而对于当今的神经网络来说,可以更自由地分享。 而对于GAN框架的精度分析则是时间-AN的直径直径, 主要是通过对GAN的直径分析, 数据分析,而我们使用的直径直径直的直的直到GAN的直径直径分析,是用来分析。在GAN的直径直判中,从G的直径直判中,通过直的直判中,从GAN的直判的直判中, 直判的直判的直判的直判的直判的直判的直判, 。在GA-直判的直判的直判中, 。在GAN的直判中,通过直判的直判的直判的直判的直判的直判的直判的直判。