关于基因建模的因果关系-保全能力 (On the causality-preservation capabilities of generative modelling)

Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.

翻译：金融业和保险业的建模核心在于金融业和保险业的广泛任务。机器学习和深层次学习模型的兴起和发展创造了许多机会来改进我们的建模工具箱。这些领域的突破往往伴随着大量数据的要求。这些庞大的数据集往往在金融和保险领域无法公开提供,这主要是因为隐私和伦理问题。缺乏数据是目前开发更好的模型的主要障碍之一。缓解这一问题的一个可能选择是基因化模型。生成模型能够模拟假冒的、但现实的、可以更自由分享的合成数据。合成数据也被称为合成数据。显明的Adversarial网络(GANs)是一个模型,可以提高我们适应数据高度分布的能力。虽然对GANs的研究在计算机愿景等领域是一个活跃的话题,但在人类科学、经济学和保险模式中却发现很少采用。原因在于,在这些领域,大部分问题都在于查明因果关系,而对于当今的神经网络来说,可以更自由地分享。而对于GAN框架的精度分析则是时间-AN的直径直径, 主要是通过对GAN的直径分析, 数据分析,而我们使用的直径直径直的直的直到GAN的直径直径分析,是用来分析。在GAN的直径直判中,从G的直径直判中,通过直的直判中,从GAN的直判的直判中, 直判的直判的直判的直判的直判的直判的直判, 。在GA-直判的直判的直判中, 。在GAN的直判中,通过直判的直判的直判的直判的直判的直判的直判的直判。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日