Statistical matching methods are widely used in the social and health sciences to estimate causal effects using observational data. Often the objective is to find comparable groups with similar covariate distributions in a dataset, with the aim to reduce bias in a random experiment. We aim to develop a foundation for deterministic methods which provide results with low bias, while retaining interpretability. The proposed method matches on the covariates and calculates all possible maximal exact matchesfor a given dataset without adding numerical errors. Notable advantages of our method over existing matching algorithms are that all available information for exact matches is used, no additional bias is introduced, it can be combined with other matching methods for inexact matching to reduce pruning and that the result is calculated in a fast and deterministic way. For a given dataset the result is therefore provably unique for exact matches in the mathematical sense. We provide proofs, instructions for implementation as well as a numerical example calculated for comparison on a complete survey.
翻译:在社会和卫生科学中广泛使用统计匹配方法来利用观测数据估计因果关系。 目标往往是在数据集中找到具有类似共变分布的可比组,以减少随机实验中的偏差。 我们的目标是为确定方法奠定基础,提供低偏差的结果,同时保留可解释性。 拟议的方法在共变法上匹配并计算给定数据集的所有可能的最大精确匹配,而不增加数字错误。 我们的方法相对于现有匹配算法的明显优势是,使用了所有精确匹配的现有信息,没有引入额外的偏差,它可以与其他不精确匹配的匹配方法相结合,以减少剪裁,结果以快速和确定性的方式计算。 对于给定的数据集,结果在数学意义上的精确匹配是独一无二的。 我们提供了证据、执行指令以及一个在全面调查中进行比较的数值示例。