This paper considers the smooth bilevel optimization in which the lower-level problem is strongly convex and the upper-level problem is possibly nonconvex. We focus on the stochastic setting where the algorithm can access the unbiased stochastic gradient evaluation with heavy-tailed noise, which is prevalent in many machine learning applications, such as training large language models and reinforcement learning. We propose a nested-loop normalized stochastic bilevel approximation (N$^2$SBA) for finding an $ε$-stationary point with the stochastic first-order oracle (SFO) complexity of $\tilde{\mathcal{O}}\big(κ^{\frac{7p-3}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{4 p - 2}{p-1}}\big)$, where $κ$ is the condition number, $p\in(1,2]$ is the order of central moment for the noise, and $σ$ is the noise level. Furthermore, we specialize our idea to solve the nonconvex-strongly-concave minimax optimization problem, achieving an $ε$-stationary point with the SFO complexity of~$\tilde{\mathcal O}\big(κ^{\frac{2p-1}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{3p-2}{p-1}}\big)$. All the above upper bounds match the best-known results under the special case of the bounded variance setting, i.e., $p=2$. We also conduct the numerical experiments to show the empirical superiority of the proposed methods.
翻译:本文研究光滑双层优化问题,其中下层问题为强凸优化,而上层问题可能为非凸优化。我们关注随机设置,其中算法可访问带有重尾噪声的无偏随机梯度估计,这在许多机器学习应用中普遍存在,例如训练大语言模型和强化学习。我们提出了一种嵌套循环归一化随机双层逼近方法(N$^2$SBA),用于寻找$ε$-稳定点,其随机一阶预言机(SFO)复杂度为$\tilde{\mathcal{O}}\big(κ^{\frac{7p-3}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{4 p - 2}{p-1}}\big)$,其中$κ$为条件数,$p\in(1,2]$为噪声的中心矩阶数,$σ$为噪声水平。此外,我们将该思想专门应用于求解非凸-强凹极小极大优化问题,获得$ε$-稳定点的SFO复杂度为~$\tilde{\mathcal O}\big(κ^{\frac{2p-1}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{3p-2}{p-1}}\big)$。以上所有上界在有限方差设置(即$p=2$)的特殊情况下均与已知最佳结果一致。我们还进行了数值实验,以展示所提出方法的实证优越性。
The local circuitry of the mammalian brain is a focus of the search for generic computational principles because it is largely conserved across species and modalities. In 2014 a model was proposed representing all neurons and synapses of the stereotypical cortical microcircuit below $1\,\text{mm}^2$ of brain surface. The model reproduces fundamental features of brain activity but its impact remained limited because of its computational demands. For theory and simulation, however, the model was a breakthrough because it removes uncertainties of downscaling, and larger models are less densely connected. This sparked a race in the neuromorphic computing community and the model became a de facto standard benchmark. Within a few years real-time performance was reached and surpassed at significantly reduced energy consumption. We review how the computational challenge was tackled by different simulation technologies and derive guidelines for the next generation of benchmarks and other domains of science.
翻译:哺乳动物大脑的局部回路是探索通用计算原理的焦点,因为它在不同物种和模态间高度保守。2014年,一个模型被提出,该模型代表了大脑表面$1\,\text{mm}^2$以下典型皮层微回路的所有神经元和突触。该模型重现了大脑活动的基本特征,但由于其计算需求巨大,其影响力一直有限。然而对于理论和模拟而言,该模型是一个突破,因为它消除了缩放过程中的不确定性,且更大模型的连接密度更低。这引发了神经形态计算社区的一场竞赛,该模型已成为事实上的标准基准。在几年内,实时性能得以实现并超越,同时能耗显著降低。我们回顾了不同模拟技术如何应对这一计算挑战,并为下一代基准测试及其他科学领域推导指导原则。
Artificial Intelligence is reshaping America's \$9.4 trillion labor market, with cascading effects that extend far beyond visible technology sectors. When AI transforms quality control tasks in automotive plants, consequences spread through logistics networks, supply chains, and local service economies. Yet traditional workforce metrics cannot capture these ripple effects: they measure employment outcomes after disruption occurs, not where AI capabilities overlap with human skills before adoption crystallizes. Project Iceberg addresses this gap using Large Population Models to simulate the human-AI labor market, representing 151 million workers as autonomous agents executing over 32,000 skills and interacting with thousands of AI tools. It introduces the Iceberg Index, a skills-centered metric that measures the wage value of skills AI systems can perform within each occupation. The Index captures technical exposure, where AI can perform occupational tasks, not displacement outcomes or adoption timelines. Analysis shows that visible AI adoption concentrated in computing and technology (2.2% of wage value, approx \$211 billion) represents only the tip of the iceberg. Technical capability extends far below the surface through cognitive automation spanning administrative, financial, and professional services (11.7%, approx \$1.2 trillion). This exposure is fivefold larger and geographically distributed across all states rather than confined to coastal hubs. Traditional indicators such as GDP, income, and unemployment explain less than 5% of this skills-based variation, underscoring why new indices are needed to capture exposure in the AI economy. By simulating how these capabilities may spread under scenarios, Iceberg enables policymakers and business leaders to identify exposure hotspots, prioritize investments, and test interventions before committing billions to implementation
翻译:人工智能正在重塑美国9.4万亿美元的劳动力市场,其连锁效应远超可见的技术领域。当AI改造汽车工厂的质量控制任务时,影响会蔓延至物流网络、供应链和地方服务经济。然而传统劳动力指标无法捕捉这些涟漪效应:它们衡量的是颠覆发生后的就业结果,而非AI能力在采用固化前与人类技能重叠的领域。冰山计划通过大型人口模型填补这一空白,模拟人机劳动力市场,将1.51亿劳动者表示为执行超过3.2万种技能、与数千种AI工具互动的自主智能体。该研究提出冰山指数——一种以技能为中心的度量指标,用于衡量AI系统在各职业中可执行技能的工资价值。该指数捕捉的是技术暴露度(即AI可执行职业任务的范围),而非替代结果或采用时间线。分析表明,集中于计算与技术领域的可见AI采用(占工资价值2.2%,约2110亿美元)仅是冰山一角。技术能力通过认知自动化深入行政、金融和专业服务领域(占11.7%,约1.2万亿美元),形成水下庞大主体。这种暴露度规模扩大五倍,且地理分布遍及全美各州,而非局限于沿海枢纽。GDP、收入和失业率等传统指标对此类技能差异的解释力不足5%,凸显了需要新指数来捕捉AI经济暴露度的必要性。通过模拟这些能力在多种情境下的扩散路径,冰山模型使政策制定者和企业领袖能在投入数十亿实施资金前,识别暴露热点、优化投资优先级并测试干预措施。
We show a nearly optimal lower bound on the length of linear relaxed locally decodable codes (RLDCs). Specifically, we prove that any $q$-query linear RLDC $C\colon \{0,1\}^k \to \{0,1\}^n$ must satisfy $n = k^{1+Ω(1/q)}$. This bound closely matches the known upper bound of $n = k^{1+O(1/q)}$ by Ben-Sasson, Goldreich, Harsha, Sudan, and Vadhan (STOC 2004). Our proof introduces the notion of robust daisies, which are relaxed sunflowers with pseudorandom structure, and leverages a new spread lemma to extract dense robust daisies from arbitrary distributions.
翻译:我们证明了线性松弛局部可译码码(RLDC)长度的近乎最优下界。具体而言,我们证明对于任意$q$查询线性RLDC $C\\colon \\{0,1\\}^k \\to \\{0,1\\}^n$,必须满足$n = k^{1+\\Omega(1/q)}$。该下界与Ben-Sasson、Goldreich、Harsha、Sudan和Vadhan(STOC 2004)已知的$n = k^{1+O(1/q)}$上界近乎匹配。我们的证明引入了鲁棒菊花的概念——这是一种具有伪随机结构的松弛向日葵,并利用新的扩散引理从任意分布中提取稠密鲁棒菊花。
Quantifying uncertainty in deep regression models is important both for understanding the confidence of the model and for safe decision-making in high-risk domains. Existing approaches that yield prediction intervals overlook distributional information, neglecting the effect of multimodal or asymmetric distributions on decision-making. Similarly, full or approximated Bayesian methods, while yielding the predictive posterior density, demand major modifications to the model architecture and retraining. We introduce MCNF, a novel post hoc uncertainty quantification method that produces both prediction intervals and the full conditioned predictive distribution. MCNF operates on top of the underlying trained predictive model; thus, no predictive model retraining is needed. We provide experimental evidence that the MCNF-based uncertainty estimate is well calibrated, is competitive with state-of-the-art uncertainty quantification methods, and provides richer information for downstream decision-making tasks.
翻译:在深度回归模型中量化不确定性对于理解模型的置信度及在高风险领域进行安全决策至关重要。现有方法虽能产生预测区间,却忽略了分布信息,未能考虑多模态或非对称分布对决策的影响。类似地,完整或近似的贝叶斯方法虽能给出预测后验密度,但需要对模型架构进行重大修改并重新训练。本文提出MCNF,一种新颖的后处理不确定性量化方法,可同时生成预测区间和完整的条件预测分布。MCNF在已训练的底层预测模型之上运行,因此无需重新训练预测模型。实验证据表明,基于MCNF的不确定性估计具有良好的校准性,与当前最先进的不确定性量化方法相比具有竞争力,并为下游决策任务提供了更丰富的信息。
The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive. However, existing protocols that rely on weaker monitors to detect unsafe behavior often fail on complex tasks beyond the monitor's comprehension. We develop control protocols based on factored cognition, in which a trusted model decomposes a task, an untrusted model solves each resultant child task in isolation, and the results are reassembled into a full solution. These protocols may improve safety by several means, such as by simplifying the context for monitors, or by obscuring vulnerabilities in the environment. We implement our protocols in the APPS coding setting and red team them against backdoor attempts from an adversarial GPT-4.1 Nano. We find that: (i) Adding factored cognition to a trusted monitoring protocol can boost safety from 41% to 63%; (ii) Safety improves because monitor performance improves; (iii) Factored cognition makes it no harder for capable LLMs to write backdoors in APPS. While our protocols show low usefulness in APPS, they hold promise for more complex tasks.
翻译:人工智能控制领域致力于开发鲁棒的控制协议与部署保障机制,以应对可能具有蓄意颠覆性的不可信人工智能系统。然而,现有依赖较弱监控器检测不安全行为的协议,在任务复杂度超出监控器理解范围时往往失效。本研究基于分解式认知构建控制协议:由可信模型将任务分解,不可信模型独立求解各子任务,最终将结果重组为完整解决方案。此类协议可通过多种途径提升安全性,例如简化监控上下文或遮蔽环境中的脆弱性。我们在APPS编程环境中实现协议,并针对对抗性GPT-4.1 Nano的后门攻击进行红队测试。研究发现:(1)在可信监控协议中引入分解式认知可将安全性从41%提升至63%;(2)安全性提升源于监控性能的改善;(3)分解式认知未增加高性能大语言模型在APPS中植入后门的难度。尽管当前协议在APPS场景中实用性有限,但其在更复杂任务中具有应用潜力。
Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pipelines combine classifiers with monthly active learning (AL) and rejection mechanisms to mitigate the impact of concept drift. In this work, we develop a novel formulation of malware detection as a one-step Markov Decision Process and train a deep reinforcement learning (DRL) agent, simultaneously optimizing sample classification performance and rejecting high-risk samples for manual labeling. We evaluated the joint detection and drift mitigation policy learned by the DRL-based Malware Detection (DRMD) agent through time-aware evaluations on Android malware datasets subject to realistic drift requiring multi-year performance stability. The policies learned under these conditions achieve a higher Area Under Time (AUT) performance compared to standard classification approaches used in the domain, showing improved resilience to concept drift. Specifically, the DRMD agent achieved an average AUT improvement of 8.66 and 10.90 for the classification-only and classification-rejection policies, respectively. Our results demonstrate for the first time that DRL can facilitate effective malware detection and improved resiliency to concept drift in the dynamic setting of Android malware detection.
翻译:现实环境中的恶意软件检测必须应对不断演变的威胁、有限的标注预算以及不确定的预测。传统分类器若缺乏额外机制,在恶意软件领域面临概念漂移时难以维持性能,因为其监督学习框架无法优化何时将决策推迟至人工标注与自适应过程。现代恶意软件检测流程通常将分类器与月度主动学习及样本拒斥机制相结合,以缓解概念漂移的影响。本研究提出一种新颖的恶意软件检测建模方法,将其构建为单步马尔可夫决策过程,并训练深度强化学习智能体,同步优化样本分类性能并将高风险样本拒斥以进行人工标注。我们通过在受真实概念漂移影响、需保持多年性能稳定的Android恶意软件数据集上进行时序感知评估,验证了基于深度强化学习的恶意软件检测智能体所学习的联合检测与漂移缓解策略。在此类条件下学习到的策略,相较于该领域常用的标准分类方法,实现了更高的时间曲线下面积性能指标,展现出对概念漂移更强的适应能力。具体而言,DRMD智能体在纯分类任务与分类-拒斥联合任务中,平均AUT分别提升了8.66和10.90。我们的研究首次证明,在动态变化的Android恶意软件检测场景中,深度强化学习能够有效提升恶意软件检测效能并增强对概念漂移的抵御能力。
Convolutional neural networks have shown remarkable performance in recent years on various computer vision problems. However, the traditional convolutional neural network architecture lacks a critical property: shift equivariance and invariance, broken by downsampling and upsampling operations. Although data augmentation techniques can help the model learn the latter property empirically, a consistent and systematic way to achieve this goal is by designing downsampling and upsampling layers that theoretically guarantee these properties by construction. Adaptive Polyphase Sampling (APS) introduced the cornerstone for shift invariance, later extended to shift equivariance with Learnable Polyphase up/downsampling (LPS) applied to real-valued neural networks. In this paper, we extend the work on LPS to complex-valued neural networks both from a theoretical perspective and with a novel building block of a projection layer from $\mathbb{C}$ to $\mathbb{R}$ before the Gumbel Softmax. We finally evaluate this extension on several computer vision problems, specifically for either the invariance property in classification tasks or the equivariance property in both reconstruction and semantic segmentation problems, using polarimetric Synthetic Aperture Radar images.
翻译:近年来,卷积神经网络在各种计算机视觉问题上展现出卓越性能。然而,传统卷积神经网络架构缺乏一个关键特性:平移等变性与平移不变性,这一特性因下采样和上采样操作而被破坏。尽管数据增强技术可帮助模型从经验上学习后一特性,但实现该目标的一致且系统化方法是通过设计下采样和上采样层,从理论上保证这些特性在结构层面得以满足。自适应多相采样(APS)为平移不变性奠定了基石,随后通过可学习多相上/下采样(LPS)扩展至平移等变性,并应用于实值神经网络。本文从理论角度将LPS工作扩展至复数神经网络,并提出一种新颖的构建模块:在Gumbel Softmax前添加从$\\mathbb{C}$到$\\mathbb{R}$的投影层。最终,我们通过极化合成孔径雷达图像,在多个计算机视觉问题上评估该扩展方案,特别针对分类任务中的不变性特性,以及重建与语义分割问题中的等变性特性。
The brain-skull interface (meninges) plays a critical role in governing brain motion during head impacts, yet computational models often simplify this interface using idealized contact conditions due to limited experimental data. This study presents an improved protocol combining experimental testing and computational modelling to determine the mechanical properties of the brain-skull interface under shear loading. Brain tissue and brain-skull complex samples were extracted from sheep cadaver heads and subjected to shear loading. Magnetic resonance imaging (MRI) was used to obtain accurate 3D geometries of the samples, which were then used to create computational grids (meshes) for simulation of the experiments using finite element (FE) models to determine subject-specific properties of the brain tissue and brain-skull interface. A second-order Ogden hyperelastic model was used for the brain tissue, and a cohesive layer was employed to model the brain-skull interface. Our results indicate that a cohesive layer captures the force-displacement and damage initiation of the brain-skull interface. The calibrated cohesive properties showed consistent patterns across samples, with maximum normal tractions ranging from 2.8-3.4 kPa and maximum tangential tractions from 1.8-2.1 kPa. This framework provides a foundation for improving the biofidelity of computational head models used in injury prediction and neurosurgical planning by replacing arbitrary boundary conditions with formulations derived from experimental data on brain-skull interface (meninges) biomechanical behaviour.
翻译:脑-颅骨界面(脑膜)在头部受冲击时对脑组织运动起关键调控作用,但由于实验数据有限,计算模型常采用理想化接触条件简化该界面。本研究提出一种结合实验测试与计算建模的改进方案,以确定剪切载荷下脑-颅骨界面的力学特性。从绵羊尸体头部提取脑组织及脑-颅骨复合样本进行剪切加载测试。利用磁共振成像(MRI)获取样本精确的三维几何结构,并据此建立计算网格(网格化模型),通过有限元(FE)模型模拟实验过程,以确定脑组织及脑-颅骨界面的个体化特性。脑组织采用二阶Ogden超弹性模型,脑-颅骨界面则通过内聚力层进行建模。结果表明,内聚力层能有效表征脑-颅骨界面的力-位移关系与损伤起始点。经标定的内聚力特性在样本间呈现一致规律:最大法向牵引力为2.8-3.4 kPa,最大切向牵引力为1.8-2.1 kPa。该框架通过采用基于脑-颅骨界面(脑膜)生物力学行为实验数据构建的模型公式替代任意边界条件,为提升用于损伤预测与神经外科规划的计算头部模型的生物保真度奠定了基础。
Modern encryption algorithms form the foundation of digital security. However, the widespread use of encryption algorithms results in significant challenges for network defenders in identifying which specific algorithms are being employed. More importantly, we find that when the plaintext distribution of test data departs from the training data, the performance of classifiers often declines significantly. This issue exposes the feature extractor's hidden dependency on plaintext features. To reduce this dependency, we adopt a method that does not learn end-to-end from ciphertext bytes. Specifically, this method is based on a set of statistical tests to compute the randomness feature of the ciphertext, and then uses the frequency distribution pattern of this feature to construct the algorithms' respective fingerprints. The experimental results demonstrate that our method achieves high discriminative performance (e.g., AUC > 0.98) in the Canterbury Corpus dataset, which contains a diverse set of data types. Furthermore, in our cross-domain evaluation, baseline models' performance degrades significantly when tested on data with a reduced proportion of structured plaintext. In sharp contrast, our method demonstrates high robustness: performance degradation is minimal when transferring between different structured domains, and even on the most challenging purely random dataset, it maintains a high level of ranking ability (AUC > 0.90).
翻译:现代加密算法构成了数字安全的基石。然而,加密算法的广泛使用给网络防御者识别具体采用的算法带来了重大挑战。更重要的是,我们发现当测试数据的明文分布与训练数据存在差异时,分类器的性能往往显著下降。这一问题揭示了特征提取器对明文特征的潜在依赖性。为降低这种依赖性,我们采用了一种不从密文字节端到端学习的方法。具体而言,该方法基于一组统计测试来计算密文的随机性特征,然后利用该特征的频率分布模式构建各算法的相应指纹。实验结果表明,在包含多种数据类型的坎特伯雷语料库数据集上,我们的方法实现了高区分性能(例如AUC > 0.98)。此外,在我们的跨域评估中,当测试数据中结构化明文比例降低时,基线模型的性能显著下降。与此形成鲜明对比的是,我们的方法展现出高度鲁棒性:在不同结构化领域间迁移时性能下降极小,即使在最具挑战性的纯随机数据集上,仍保持高水平的排序能力(AUC > 0.90)。