Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.
翻译:正确的统计模型包含概念关系和数据计量方式细节的域论理论;然而,数据分析员目前缺乏以综合方式记录和推理领域假设、数据收集和建模选择方法的工具支持,从而导致可能损害科学有效性的错误。例如,一般线性混合效应模型(GLMMs)有助于回答复杂的研究问题,但忽略随机效应会损害结果的可概括性。为了满足这一需要,我们介绍了Tisane,这是一个制定通用线性模型的混合倡议系统,具有和不具有混合效应。Tisane采用了一种用于表达和询问变量之间关系的研究设计规格语言。Tisane贡献了一个互动的汇编进程,在图表中代表各种关系,推断候选统计模型,并询问后续问题,使用户的查询无法形成一个有效的模型。在与三名研究人员进行的案例研究中,我们发现Tisane在避免过去的错误的同时,帮助他们关注其目标和假设。