Numerous advancements in deep learning can be attributed to the access to large-scale and well-annotated datasets. However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issue, we propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. In particular, we synthesize objects simply by assembling multiple random primitives. These objects are thus auto-annotated with part labels originating from primitives. This allows us to perform multi-task learning by combining the supervised segmentation with unsupervised reconstruction. Considering the large overhead of learning on the generated dataset, we further propose a dataset distillation strategy to remove redundant samples regarding a target dataset. We conduct extensive experiments for the downstream tasks of 3D object classification. The results indicate that our dataset, together with multi-task pretraining on its annotations, achieves the best performance compared to other commonly used datasets. Further study suggests that our strategy can improve the model performance by pretraining and fine-tuning scheme, especially for the dataset with a small scale. In addition, pretraining with the proposed dataset distillation method can save 86\% of the pretraining time with negligible performance degradation. We expect that our attempt provides a new data-centric perspective for training 3D deep models.
翻译:深层学习的许多进展可归因于获得大规模和附加说明的数据集。然而,由于收集费用高昂,这种数据集在3D计算机视野中的费用太高,在3D计算机视野中费用太高。为了缓解这一问题,我们提出了一种具有成本效益的方法,可以自动生成大量3D天体并配有说明。特别是,我们仅仅通过组合多个随机原始数据来合成对象。因此,这些天体具有自动附加说明的特性,并配有源自原始系统的部分标签。这使我们能够进行多任务学习,将受监督的分区与不受监督的重建结合起来。考虑到在生成的数据集上学习的间接费用很大,我们进一步提议了一个数据集蒸馏战略,以删除关于目标数据集的多余样品。我们为3D天体物体分类的下游任务进行广泛的实验。结果显示,我们的数据集,连同多层图解的预演练,取得了与其他常用数据集相比的最佳性能。进一步研究表明,我们的战略可以通过预先培训和微调计划来改进模型的性能,特别是小尺度的数据元件。此外,我们还提议了一个深度的模拟,我们用新的实验性能分析模型,我们用86号模型提供新的实验性能分析。