Facial expression recognition (FER) must remain robust under both cultural variation and perceptually degraded visual conditions, yet most existing evaluations assume homogeneous data and high-quality imagery. We introduce an agent-based, streaming benchmark that reveals how cross-cultural composition and progressive blurring interact to shape face recognition robustness. Each agent operates in a frozen CLIP feature space with a lightweight residual adapter trained online at sigma=0 and fixed during testing. Agents move and interact on a 5x5 lattice, while the environment provides inputs with sigma-scheduled Gaussian blur. We examine monocultural populations (Western-only, Asian-only) and mixed environments with balanced (5/5) and imbalanced (8/2, 2/8) compositions, as well as different spatial contact structures. Results show clear asymmetric degradation curves between cultural groups: JAFFE (Asian) populations maintain higher performance at low blur but exhibit sharper drops at intermediate stages, whereas KDEF (Western) populations degrade more uniformly. Mixed populations exhibit intermediate patterns, with balanced mixtures mitigating early degradation, but imbalanced settings amplify majority-group weaknesses under high blur. These findings quantify how cultural composition and interaction structure influence the robustness of FER as perceptual conditions deteriorate.
翻译:面部表情识别(FER)必须在文化差异和感知退化的视觉条件下保持鲁棒性,然而现有评估大多假设数据同质且图像质量高。我们引入一个基于智能体的流式基准测试,揭示了跨文化构成与渐进模糊如何相互作用以塑造人脸识别的鲁棒性。每个智能体在冻结的CLIP特征空间中运行,配备一个轻量级残差适配器(在线训练时sigma=0,测试期间固定)。智能体在5x5网格上移动交互,而环境提供按sigma调度的高斯模糊输入。我们研究了单文化群体(仅西方、仅亚洲)以及平衡(5/5)与不平衡(8/2、2/8)构成的混合环境,以及不同的空间接触结构。结果显示文化群体间存在明显的不对称退化曲线:JAFFE(亚洲)群体在低模糊度下保持较高性能,但在中间阶段表现出更急剧的下降,而KDEF(西方)群体的退化则更为均匀。混合群体呈现中间模式,平衡混合能缓解早期退化,但不平衡设置会在高模糊度下放大主流群体的弱点。这些发现量化了在感知条件恶化时,文化构成与交互结构如何影响FER的鲁棒性。