Two linearly uncorrelated binary variables must be also independent because non-linear dependence cannot manifest with only two possible states. This inherent linearity is the atom of dependency constituting any complex form of relationship. Inspired by this observation, we develop a framework called binary expansion linear effect (BELIEF) for assessing and understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking parallels with the Gaussian world. In particular, an algebraic structure on the predictors with nonzero slopes governs conditional independence properties. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how modeling is affected by the choice of link. For example, setting a GLM interaction coefficient to zero does not necessarily lead to the kind of no-interaction model assumption as understood under their linear model counterparts. Furthermore, for a binary response, maximum likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that is responsible for complete separation. We explore these phenomena and provide a host of related theoretical results. We also provide preliminary empirical demonstration and verification of some theoretical results.
翻译:由于非线性依赖不能仅以两个可能的状态表现出来,因此两个线性不相关的二进制变量也必须是独立的,因为非线性依赖不能仅以两个可能的状态表现出来。这种内在的线性系是构成任何复杂关系形式的依赖原子。受此观察的启发,我们开发了一个称为双线性扩展线性效应(BELIEF)的框架,用于评估和理解与二进制结果的任意关系。BELIEF框架的模型很容易解释,因为它们描述线性模型语言的二进制变量关联,产生方便的理论洞察,与高斯世界平行。特别是,非零斜度预测器上的代数结构是制约有条件独立特性的。根据BELIEF,我们可以通过透明的线性模型研究通用线性模型(GLM),以洞察看模型如何受到链接选择的影响。例如,将GLM互动系数设定为零并不一定导致在线性模型对应方所理解的“不交互作用模型”假设。此外,对于GLMS的最大可能性是完全分离的,当数据是完全分离时,当数据是最佳的理论性预测结果时,我们也自动地提供了初步的理论性推算结果。