在大矩阵中每个以任何方式与他人或依附变量关联的柱体,每个柱体在大矩阵中都有新的、可计算、可移动的算法标志,当柱体像染色体中的突变一样链接时,其功率要高得多。 (A novel, computationally tractable algorithm flags in big matrices every column associated in any way with others or a dependent variable, with much higher power when columns are linked like mutations in chromosomes)

2022 年 2 月 21 日

A novel, computationally tractable algorithm flags in big matrices every column associated in any way with others or a dependent variable, with much higher power when columns are linked like mutations in chromosomes

翻译：在大矩阵中每个以任何方式与他人或依附变量关联的柱体,每个柱体在大矩阵中都有新的、可计算、可移动的算法标志,当柱体像染色体中的突变一样链接时,其功率要高得多。

Marcos A. Antezana,Carlos A. Machado

Scanning exhaustively a big data matrix DM for subsets of independent variables IVs that are associated with a dependent variable DV is computationally tractable only for 1- and 2-IV effects. I present a highly computationally tractable Participation-In-Association Score (PAS) that in a DM with markers flags every column that is strongly associated with others. PAS examines no column subsets and its computational cost grows linearly with DM columns, remaining reasonable even in million-column DMs. PAS exploits how associations of markers in DM rows cause matches associations in the rows' pairwise comparisons. For every such comparison with a match at a tested column, PAS computes the other matches by modifying the comparison's total matches (scored once per DM), yielding a distribution of conditional matches that is perturbed by associations of the tested column. Equally tractable is dvPAS that flags DV-associated IVs by permuting the markers in the DV. P values are obtained by permutation and Sidak-corrected for multiple tests, bypassing model selection. Simulations show that i) PAS and dvPAS generate uniform-(0,1)-distributed type I error in null DMs and ii) detect randomly encountered binary and trinary models of significant n-column association and n-IV association with a binary DV, respectively, with power in the order of magnitude of exhaustive evaluation's and false positives that are uniform-(0,1)-distributed or straightforwardly tuned to be so. Power to detect 2-way DV-associated 100-marker+ runs is non-parametrically ultimate but that to detect pure n-column associations and pure n-IV DV associations sinks exponentially as n increases. Power increases about twofold in trinary vs. binary DMs and in a major way when there are background associations like between mutations in chromosomes, specially in trinary DMs where dvPAS filters said background most effectively.

翻译：彻底扫描与依赖性变量DV相关的独立变量 IV 子集的巨型数据矩阵 DM 。我展示了一个在计算上高度可移动的“ 参与- 协会评分 ”, 在一个带有标记的DM 中, 每列都标出与其他列有强烈关联的每列。 PAS 没有任何分集,其计算成本随着DM 列线增长而线性增长, 即使在百万列的 DMD 中也保持合理。 PAS 利用了DM 列中的标记关联直接匹配了行对齐的对比。每次与测试的列中的匹配,PAS都会通过修改总比值(每张DMDM 中一次计数) 来计算其他匹配。 DV DV 显示, DV 平调显示 DV 相关标记的比值比值比值比值更高。 PDDO 和 Sidak 校对多个测试的比值比值, 绕过一个测试的匹配的匹配值, 将ODMA 和 RDR 的比值的比值比值比值显示, 和 RDMDMDR 的比的比的比值的比值比值比值比值比值比值比值比值的比值比值比值和比值在DMDMDMDMDM- 的比值在DR- b- b- b- b- b- d- 的比值在DM- 的比值在DM- b 和的比值在DM- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d- d-