维克多校准（VC）：圆桌协调下的多轮置信度校准与CP4.3治理压力测试 (Victor Calibration (VC): Multi-Pass Confidence Calibration and CP4.3 Governance Stress Test under Round-Table Orchestration)

Safety alignment can make frontier LMs overly conservative, degrading collaboration via hedging or false refusals. We present a lightweight toolkit with three parts: (1) Victor Calibration (VC), a multi-pass protocol that elicits a scalar confidence proxy T (T0<T1<T2) through iterative evidence re-evaluation; (2) FD-Lite, a behavior-only phenomenology audit with a fixed anchor phrase and a meta-prefix trap to avoid anthropomorphic claims; and (3) CP4.3, a governance stress test for rank invariance and allocation monotonicity (M6). Across Claude 4.5 models (Haiku, Sonnet no-thinking, Sonnet thinking) and Opus, we observe monotonic VC trajectories without violating safety invariants, and stable CP4.3 behavior. ("Opus" here refers to a single Claude Opus 4.1 session accessed via a standard UI account, as reported in Table 1.) This work was conducted by a single operator (n=1) and is intended as hypothesis-generating; we explicitly invite replication, critique, and extension by the research community. We include prompt templates and an artifact plan to facilitate independent verification.

翻译：安全对齐可能导致前沿语言模型过于保守，通过规避或错误拒绝而降低协作效率。我们提出一个轻量级工具包，包含三个部分：（1）维克多校准（VC），一种通过迭代证据重评估来获取标量置信度代理T（T0<T1<T2）的多轮协议；（2）FD-Lite，一种仅基于行为的现象学审计方法，采用固定锚定短语和元前缀陷阱以避免拟人化声明；（3）CP4.3，针对排序不变性与分配单调性（M6）的治理压力测试。在Claude 4.5系列模型（Haiku、Sonnet无思考模式、Sonnet思考模式）及Opus上的实验表明，VC轨迹保持单调性且未违反安全不变性，CP4.3行为表现稳定。（此处“Opus”指通过标准UI账户访问的单个Claude Opus 4.1会话，详见表1。）本研究由单操作者（n=1）完成，旨在提出假设；我们明确邀请研究界进行复现、批判与拓展。我们提供了提示模板与制品计划以促进独立验证。

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html