Clustered Federated Multi-task Learning (CFL) has emerged as a promising technique to address statistical challenges, particularly with non-independent and identically distributed (non-IID) data across users. However, existing CFL studies entirely rely on the impractical assumption that devices possess access to accurate ground-truth labels. This assumption becomes problematic in hierarchical wireless networks (HWNs), with vast unlabeled data and dual-level model aggregation, slowing convergence speeds, extending processing times, and increasing resource consumption. To this end, we propose Clustered Federated Semi-Supervised Learning (CFSL), a novel framework tailored for realistic scenarios in HWNs. We leverage specialized models from device clustering and present two prediction model schemes: the best-performing specialized model and the weighted-averaging ensemble model. The former assigns the most suitable specialized model to label unlabeled data, while the latter unifies specialized models to capture broader data distributions. CFSL introduces two novel prediction time schemes, split-based and stopping-based, for accurate labeling timing, and two device selection strategies, greedy and round-robin. Extensive testing validates CFSL's superiority in labeling/testing accuracy and resource efficiency, achieving up to 51% energy savings.
翻译:暂无翻译