The quality of data representation is paramount for the performance of a model. Recent research has focused on enhancing representation learning by incorporating more information about the intra-sample structures of individual data points, such as local and global attention. Additionally, researchers have explored methods to model the inter-sample relationships, including manifold, contrastive, and discrete representation learning. In this study, we introduce a new training loss, which considers both intra-sample structure and inter-sample relationships, leveraging the concept of {\it atoms} to represent data points. This new approach, {\it Atom Modeling}, offers a fresh perspective to discretize data representations within a continuous space. Through experiments, we demonstrate that Atom Modeling enhances the performance of existing models in tasks involving classification and generation, across diverse domains including vision and language. These findings underscore the potential of Atom Modeling to enhance data representation and improve model learning, suggesting a promising direction for future research.
翻译:暂无翻译