Scanning exhaustively a big data matrix DM for subsets of independent variables IVs that are associated with a dependent variable DV is computationally tractable only for 1- and 2-IV effects. I present a highly computationally tractable Participation-In-Association Score (PAS) that in a DM with markers flags every column that is strongly associated with others. PAS examines no column subsets and its computational cost grows linearly with DM columns, remaining reasonable even in million-column DMs. PAS exploits how associations of markers in DM rows cause matches associations in the rows' pairwise comparisons. For every such comparison with a match at a tested column, PAS computes the other matches by modifying the comparison's total matches (scored once per DM), yielding a distribution of conditional matches that is perturbed by associations of the tested column. Equally tractable is dvPAS that flags DV-associated IVs by permuting the markers in the DV. P values are obtained by permutation and Sidak-corrected for multiple tests, bypassing model selection. Simulations show that i) PAS and dvPAS generate uniform-(0,1)-distributed type I error in null DMs and ii) detect randomly encountered binary and trinary models of significant n-column association and n-IV association with a binary DV, respectively, with power in the order of magnitude of exhaustive evaluation's and false positives that are uniform-(0,1)-distributed or straightforwardly tuned to be so. Power to detect 2-way DV-associated 100-marker+ runs is non-parametrically ultimate but that to detect pure n-column associations and pure n-IV DV associations sinks exponentially as n increases. Power increases about twofold in trinary vs. binary DMs and in a major way when there are background associations like between mutations in chromosomes, specially in trinary DMs where dvPAS filters said background most effectively.
翻译:彻底扫描与依赖性变量DV相关的独立变量 IV 子集的巨型数据矩阵 DM 。 我展示了一个在计算上高度可移动的“ 参与- 协会评分 ”, 在一个带有标记的DM 中, 每列都标出与其他列有强烈关联的每列。 PAS 没有任何分集,其计算成本随着DM 列线增长而线性增长, 即使在百万列的 DMD 中也保持合理。 PAS 利用了DM 列中的标记关联直接匹配了行对齐的对比。 每次与测试的列中的匹配,PAS都会通过修改总比值(每张DMDM 中一次计数) 来计算其他匹配。 DV DV 显示, DV 平调显示 DV 相关标记的比值比值比值比值更高。 PDDO 和 Sidak 校对多个测试的比值比值, 绕过一个测试的匹配的匹配值, 将ODMA 和 RDR 的比值 的比值比值比值 显示, 和 RDMDMDR 的比 的比 的比值 的比值比值比值比值比值比值比值比值 的比值比值比值 和比值在DMDMDMDMDM- 的比值在DR- b- b- b- b- b- d- 的比值在DM- 的比值在DM- b 和 的比值在DM- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d-