Independence testing plays a central role in statistical and causal inference from observational data. Standard independence tests assume that the data samples are independent and identically distributed (i.i.d.) but that assumption is violated in many real-world datasets and applications centered on relational systems. This work examines the problem of estimating independence in data drawn from relational systems by defining sufficient representations for the sets of observations influencing individual instances. Specifically, we define marginal and conditional independence tests for relational data by considering the kernel mean embedding as a flexible aggregation function for relational variables. We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data under a set of structural assumptions. We empirically evaluate our proposed method on a variety of synthetic and semi-synthetic networks and demonstrate its effectiveness compared to state-of-the-art kernel-based independence tests.
翻译:标准独立测试假定数据样本是独立的,分布相同(i.d.d.),但许多以关系系统为中心的真实世界数据集和应用都违反了这一假设。这项工作审查了通过确定影响个别情况的成套观察数据的充分表述来估计从关系系统获得的数据的独立性的问题。具体地说,我们通过将内核意味着嵌入作为关系变量的灵活集合功能来界定关系数据的边际和有条件独立测试。我们提议进行一致、非参数、可伸缩的内核测试,以便在一套结构假设下对非i.i.d.的观察数据进行关系独立测试。我们实证地评估了我们提议的关于各种合成和半合成独立网络的方法,并表明其与最先进的内核独立测试相比的有效性。