The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by concentrating the focus on each individual client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images, thereby calibrating the focus and the recognition result. Further considering that ICIIA's overhead is dominated by linear projection, we propose partitioned linear projection with feature shuffling for replacement and allow increasing the number of partitions to dramatically improve efficiency without scarifying too much accuracy. We finally evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets. Extensive evaluation results demonstrate the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively.
翻译:图像识别应用的主流工作流程是首先在云层上为一系列广泛的课程培训一个全球模型,然后为众多客户服务,每个客户只需一次性的云基培训即可适应客户。 特别是,根据某个客户的目标图像,ICIA采用多头自我定位模式从客户历史无标签图像中检索相关图像,从而调整重点和认知结果。 此外,考虑到CIIA的间接费用以线性预测为主,我们提议在现有主干识别模型中插入一个新的客户间和图像间关注模块(ICIIA), 只需要一次性的云基培训才能适应客户。 特别是,根据某个客户的目标图像,ICIA采用多头自我定位模型,从客户历史无标签图像中检索相关图像,从而调整重点和识别结果。我们提议将配置的线性预测与功能抖动在一起,并允许增加分区的效率,而不会留下太多的精确度。 我们最后用一个目标图像IMIA的精确度评估, 将IMIA 的精确度提高到 0.02, 并且将IMIA 的精确度提高到 5 。