Causal discovery (CD) from time-varying data is important in neuroscience, medicine, and machine learning. Techniques for CD include randomized experiments which are generally unbiased but expensive. It also includes algorithms like regression, matching, and Granger causality, which are only correct under strong assumptions made by human designers. However, as we found in other areas of machine learning, humans are usually not quite right and are usually outperformed by data-driven approaches. Here we test if we can improve causal discovery in a data-driven way. We take a system with a large number of causal components (transistors), the MOS 6502 processor, and meta-learn the causal discovery procedure represented as a neural network. We find that this procedure far outperforms human-designed causal discovery procedures, such as Mutual Information and Granger Causality. We argue that the causality field should consider, where possible, a supervised approach, where CD procedures are learned from large datasets with known causal relations instead of being designed by a human specialist. Our findings promise a new approach toward CD in neural and medical data and for the broader machine learning community.
翻译:从时间变化的数据中得出的因果发现(CD)在神经科学、医学和机器学习中很重要。 CD的技术包括随机实验,这些实验一般没有偏见,但费用昂贵。 它还包括回归、匹配和引因果等算法,这些算法在人类设计者的强烈假设下是完全正确的。 然而,正如我们在机器学习的其他领域所发现的那样,人类通常并不完全正确,而且通常由数据驱动的方法来完成。 我们在这里测试,如果我们能够以数据驱动的方式改进因果发现,那么我们就可以用数据驱动的方法进行测试。 我们采用了一个包含大量因果组成部分( Transistors)、MOS 6502处理器的系统,以及以神经网络为代表的因果发现程序的元值。我们发现,这一程序远远超越了人类设计的因果发现程序,例如相互信息和Granger Causality。 我们主张, 因果关系领域应尽可能考虑一种监督的方法, 即CD程序从已知因果关系的大数据集中学习,而不是由人类专家设计。 我们的发现结果预示着在神经和医学数据以及更广泛的机器学习中采用新的CD。