Causal discovery (CD) from time-varying data is important in neuroscience, medicine, and machine learning. Techniques for CD include randomized experiments which are generally unbiased but expensive. It also includes algorithms like regression, matching, and Granger causality, which are only correct under strong assumptions made by human designers. However, as we found in other areas of machine learning, humans are usually not quite right and human expertise is usually outperformed by data-driven approaches. Here we test if we can improve causal discovery in a data-driven way. We take a perturbable system with a large number of causal components (transistors), the MOS 6502 processor, acquire the causal ground truth, and learn the causal discovery procedure represented as a neural network. We find that this procedure far outperforms human-designed causal discovery procedures, such as Mutual Information, LiNGAM, and Granger Causality both on MOS 6502 processor and the NetSim dataset which simulates functional magnetic resonance imaging (fMRI) results. We argue that the causality field should consider, where possible, a supervised approach, where CD procedures are learned from large datasets with known causal relations instead of being designed by a human specialist. Our findings promise a new approach toward improving CD in neural and medical data and for the broader machine learning community.
翻译:从时间变化的数据中得出的因果发现(CD)在神经科学、医学和机器学习中很重要。CD的技术包括随机实验,这些实验一般没有偏见,但费用昂贵。它还包括回归、匹配和引因因果等算法,这些算法在人类设计者的强烈假设下才是正确的。然而,正如我们在机器学习的其他领域所发现的那样,人类通常并不完全正确,而由数据驱动的方法通常比数据驱动的方法高得多。我们在这里测试我们能否用数据驱动的方法改进因果发现。我们采用了一个具有大量因果组成部分(透明者)、MOS 6502处理器的易扰动系统,获得因果地面真相,并学习作为神经网络的因果发现程序。我们发现,这一程序远远不符合人为设计的因果发现程序,如相互信息、LINGAM和Granger Causality在MOS 6502处理器和模拟功能磁共振成像(fMRI)的NetSimicset 。我们争论说,在可能情况下,因果领域,应该考虑从我们所了解的内核研究的大规模数据的方法,也就是我们所了解的内核数据。