G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behavior and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.
翻译:G-Enum 直方图是非正常直方图构造的一种新的快速和完全自动化的新方法。通过将直方图构造设计成密度估计问题和自动化作为模型选择任务,这些直方图利用最低描述长度原则(MDL)得出两种不同的模型选择标准。关于这些标准的一些经证实的理论结果使人们对其无症状行为有了洞察力,并用来加速优化。这些洞察力,加上贪婪的搜索超常,被用来在线形时间而不是以前工程产生的多元时构建直方图。拟议的MDL密度估计方法的能力在文献中以其他完全自动化的方法加以说明,包括合成和大型真实世界数据集。