The self-improving sorter proposed by Ailon et al. consists of two phases: a relatively long training phase and rapid operation phase. In this study, we have developed an efficient way to further improve this sorter by approximating its training phase to be faster but not sacrificing much performance in the operation phase. It is very necessary to ensure the accuracy of the estimated entropy when we test the performance of this approximated sorter. Thus we further developed a useful formula to calculate an upper bound for the 'error' of the estimated entropy derived from the input data with unknown distributions. Our work will contribute to the better use of this self-improving sorter for huge data in a quicker way.
翻译:Ailon等人提出的自我改进分类方法由两个阶段组成:相对长的训练阶段和快速操作阶段。在本研究中,我们开发了一种有效的方法,通过将培训阶段的进度接近于更快,而不是牺牲运行阶段的很多性能来进一步改进这种分类方法。非常有必要确保在测试这一近似分类器的性能时确保估计的导体的准确性。因此,我们进一步开发了一个有用的公式,用于计算从未知分布的输入数据中得出的估计导体的“导体”的上限。我们的工作将有助于更好地使用这一自我改进的分解器,以更快的方式获取巨大的数据。