DataDios SmartDiff自适应执行调度器 (Adaptive Execution Scheduler for DataDios SmartDiff)

We present an adaptive scheduler for a single differencing engine (SmartDiff) with two execution modes: (i) in-memory threads and (ii) Dask based parallelism. The scheduler continuously tunes batch size and worker/thread count within fixed CPU and memory budgets to minimize p95 latency. A lightweight preflight profiler estimates bytes/row and I/O rate; an online cost/memory model prunes unsafe actions; and a guarded hill-climb policy favors lower latency with backpressure and straggler mitigation. Backend selection is gated by a conservative working-set estimate so that in-memory execution is chosen when safe, otherwise Dask is used. Across synthetic and public tabular benchmarks, the scheduler reduces p95 latency by 23 to 28 percent versus a tuned warm-up heuristic (and by 35 to 40 percent versus fixed grid baselines), while lowering peak memory by 16 to 22 percent (25 to 32 percent vs. fixed) with zero OOMs and comparable throughput.

翻译：本文提出了一种用于单差分引擎（SmartDiff）的自适应调度器，该引擎支持两种执行模式：(i) 内存线程模式与(ii) 基于Dask的并行模式。该调度器在固定的CPU与内存资源约束下，持续动态调整批处理大小及工作线程/线程数量，以最小化p95延迟。系统通过轻量级预执行分析器预估每行字节数与I/O速率；利用在线成本/内存模型剔除不安全操作；采用带防护机制的爬山策略，结合背压机制与慢任务缓解技术，优先选择低延迟方案。后端执行模式的选择由保守的工作集估算值决定：当内存安全时选择内存执行模式，否则启用Dask并行模式。在合成与公开表格数据基准测试中，相较于调优预热启发式方法，该调度器将p95延迟降低23%至28%（相较于固定网格基线降低35%至40%），同时将峰值内存占用减少16%至22%（相较于固定配置降低25%至32%），且实现了零内存溢出事件，并保持了相当的吞吐量。