With advances in vision and perception architectures, we have realized that working with data is equally crucial, if not more, than the algorithms. Till today, we have trained machines based on our knowledge and perspective of the world. The entire concept of Dataset Structural Index(DSI) revolves around understanding a machine`s perspective of the dataset. With DSI, I show two meta values with which we can get more information over a visual dataset and use it to optimize data, create better architectures, and have an ability to guess which model would work best. These two values are the Variety contribution ratio and Similarity matrix. In the paper, I show many applications of DSI, one of which is how the same level of accuracy can be achieved with the same model architectures trained over less amount of data.
翻译:随着视觉和认知结构的进步,我们认识到,与数据合作与算法同样重要,如果不是更多的话,也同样重要。直到今天,我们已经根据我们对世界的知识和观点对机器进行了培训。数据集结构指数的整个概念围绕着了解机器对数据集的看法。有了DSI,我展示了两个元值,我们可以通过视觉数据集获得更多信息,并利用它优化数据,创造更好的结构,并能够猜测哪种模型最有效。这两个值是差异性贡献率和相似性矩阵。在论文中,我展示了DSI的许多应用,其中一个是,通过对数据数量较少的同一模型的训练,可以实现相同程度的准确性。