Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.
翻译:视觉分析是了解您数据的最重要步骤。 这在时间序列中尤为明显, 因为数据类型很难描述, 并且无法在使用摘要统计时被完全理解。 为了实现有效的时间序列可视化, 需要满足四个要求; 一个工具应该是 (1) 互动的, (2) 可扩缩到数百万个数据点, (3) 在常规数据科学环境中是无法缩放的, (4) 高度可配置。 我们观察到开放源 Python 可视化工具包在大多数视觉分析任务中赋予数据科学家权力, 但缺乏可调和互动的组合来实现有效的时间序列可视化。 作为便利这些要求的手段, 我们创建了 Plotly-Resampler, 是一个开放源Python 库库库图书馆。 Plotly- Reampler 是Pyson 的附加工具, 通过根据当前图表视图对基本数据进行汇总, Prolotly-Resampler 构建一个快速化的缩放组合, 作为工具的可调控日期, 并有效地对数据进行更精确性分析。