We live in an era of data floods, and deep neural networks play a pivotal role in this moment. Natural data inherently exhibits several challenges such as long-tailed distribution and model fairness, where data imbalance is at the center of fundamental issues. This imbalance poses a risk of deep neural networks producing biased predictions, leading to potentially severe ethical and social problems. To address these problems, we leverage the recent generative models advanced in generating high-quality images. In this work, we propose SYNAuG, which utilizes synthetic data to uniformize the given imbalance distribution followed by a simple post-calibration step considering the domain gap between real and synthetic data. This straightforward approach yields impressive performance on datasets for distinctive data imbalance problems such as CIFAR100-LT, ImageNet100-LT, UTKFace, and Waterbirds, surpassing the performance of existing task-specific methods. While we do not claim that our approach serves as a complete solution to the problem of data imbalance, we argue that supplementing the existing data with synthetic data proves to be an effective and crucial step in addressing data imbalance concerns.
翻译:暂无翻译