This paper contributes a novel visualization method, Missingness Glyph, for analysis and exploration of missing values in data. Missing values are a common challenge in most data generating domains and may cause a range of analysis issues. Missingness in data may indicate potential problems in data collection and pre-processing, or highlight important data characteristics. While the development and improvement of statistical methods for dealing with missing data is a research area in its own right, mainly focussing on replacing missing values with estimated values, considerably less focus has been put on visualization of missing values. Nonetheless, visualization and explorative analysis has great potential to support understanding of missingness in data, and to enable gaining of novel insights into patterns of missingness in a way that statistical methods are unable to. The Missingness Glyph supports identification of relevant missingness patterns in data, and is evaluated and compared to two other visualization methods in context of the missingness patterns. The results are promising and confirms that the Missingness Glyph in several cases perform better than the alternative visualization methods.
翻译:本文为分析和探索数据缺失值提供了一种新型的直观方法,即《失踪的Glyph》,用于分析和探索数据缺失值。在大多数数据生成领域,缺失值是一个共同的挑战,并可能造成一系列分析问题。数据缺失可能表明数据收集和预处理方面的潜在问题,或突出重要的数据特征。虽然处理缺失数据的统计方法的开发和改进本身就是一个研究领域,主要侧重于用估计值取代缺失值,但对缺失值的直观化关注却少得多。然而,可视化和探索性分析极有可能支持对数据缺失的理解,并能以统计方法无法采用的方式获得对缺失模式的新洞察力。《失踪的Glyph》支持了数据中相关缺失率模式的识别,并在缺失模式方面与其他两种直观方法进行了评估和比较。结果很有希望,并证实在若干情况下失踪的Glyph比其他可视方法效果更好。