We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence hypotheses. In settings where the ultimate scientific goal is to use the selected graph to inform estimation of some causal effect of interest (e.g., by selecting a valid and sufficient set of adjustment variables), we argue that a "cautious" approach to graph selection should control the probability of falsely removing edges and prefer dense, rather than sparse, graphs. We propose a simple inversion of the usual conditional independence testing procedure: to remove an edge, test the null hypothesis of conditional association greater than some user-specified threshold, rather than the null of independence. This equivalence testing formulation to testing independence constraints leads to a procedure with desriable statistical properties and behaviors that better match the inferential goals of certain scientific studies, for example observational epidemiological studies that aim to estimate causal effects in the face of causal model uncertainty. We illustrate our approach on a data example from environmental epidemiology.
翻译:暂无翻译