Data extraction algorithms on data hypercubes, or datacubes, are traditionally only capable of cutting boxes of data along the datacube axes. For many use cases however, this is not a sufficient approach and returns more data than users might actually need. This not only forces users to apply post-processing after extraction, but more importantly this consumes more I/O resources than is necessary. When considering very large datacubes from which users only want to extract small non-rectangular subsets, the box approach does not scale well. Indeed, with this traditional approach, I/O systems quickly reach capacity, trying to read and return unwanted data to users. In this paper, we propose a novel technique, based on computational geometry concepts, which instead carefully pre-selects the precise bytes of data which the user needs in order to then only read those from the datacube. As we discuss later on, this novel extraction method will considerably help scale access to large petabyte size data hypercubes in a variety of scientific fields.
翻译:暂无翻译