数据点选取对于线图可视化的方法学评估和基于证据的准则 (Data Point Selection for Line Chart Visualization: Methodological Assessment and Evidence-Based Guidelines)

Time series visualization plays a crucial role in identifying patterns and extracting insights across various domains. However, as datasets continue to grow in size, visualizing them effectively becomes challenging. Downsampling, which involves data aggregation or selection, is a well-established approach to overcome this challenge. This work focuses on data selection algorithms, which accomplish downsampling by selecting values from the original time series. Despite their widespread adoption in visualization platforms and time series databases, there is limited literature on the evaluation of these techniques. To address this, we propose an extensive metrics-based evaluation methodology. Our methodology analyzes visual representativeness by assessing how well a downsampled time series line chart visually approximates the original data. Moreover, our methodology includes a novel concept called "visual stability", which captures visual changes when updating (streaming) or interacting with the visualization (panning and zooming). We evaluated four data point selection algorithms across three open-source visualization toolkits using our proposed methodology, considering various figure-drawing properties. Following the analysis of our findings, we formulated a set of evidence-based guidelines for line chart visualization at scale with downsampling. To promote reproducibility and enable the qualitative evaluation of new advancements in time series data point selection, we have made our methodology and results openly accessible. The proposed evaluation methodology, along with the obtained insights from this study, establishes a foundation for future research in this domain.

翻译：时间序列可视化在跨领域中识别模式和提取洞见中发挥着至关重要的作用。然而，随着数据集的不断增长，有效地可视化它们变得具有挑战性。下采样是一种行之有效的方法，该方法涉及数据聚合或选择以克服这一挑战。本工作重点研究了数据选择算法，通过从原始时间序列中选择值来实现下采样。尽管这些技术在可视化平台和时间序列数据库中被广泛采用，但有关这些技术评估的文献有限。为了解决这个问题，我们提出了一个广泛的基于指标的评估方法。我们的方法通过评估下采样的时间序列线图如何视觉近似于原始数据，分析视觉代表性。此外，我们的方法包括一个新颖的概念，称为“视觉稳定性”，它捕捉了在更新（流式传输）或与可视化交互（平移和缩放）时的视觉变化情况。我们使用我们提出的方法评估了三个开源可视化工具箱中的四个数据点选择算法，考虑各种绘图属性。在分析我们的发现之后，我们制定了一套基于证据的线图可视化准则，使其适用于大规模下采样。为了促进可重拾性并使新的时间序列数据点选择方案的定性评估成为可能，我们使我们的方法和结果得以公开访问。所提出的评估方法以及该研究获得的见解为这个领域的未来研究奠定了基础。