Data-driven decision-making is at the core of many modern applications, and understanding the data is critical in supporting trust in these decisions. However, data is dynamic and evolving, just like the real-world entities it represents. Thus, an important component of understanding data is analyzing and drawing insights from the changes it undergoes. Existing methods for exploring data change list differences exhaustively, which are not interpretable by humans and lack salient insights regarding change trends. For example, an explanation that semantically summarizes changes to highlight gender disparities in performance rewards is more human-consumable than a long list of employee salary changes. We demonstrate ChARLES, a system that derives semantic summaries of changes between two snapshots of an evolving database, in an effective, concise, and interpretable way. Our key observation is that, while datasets often evolve through point and other small-batch updates, rich data features can reveal latent semantics that can intuitively summarize the changes. Under the hood, ChARLES compares database versions, infers feasible transformations by fitting multiple regression lines over different data partitions to derive change summaries, and ranks them. ChARLES allows users to customize it to obtain their preferred explanation by navigating the accuracy-interpretability tradeoff, and offers a proof of concept for reasoning about data evolution over real-world datasets.
翻译:暂无翻译