Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance, cost, and coordination overhead emerge from the joint interaction of horizontal elasticity and per-node CPU, memory, network bandwidth, and storage IOPS. As a result, systems often overreact to load spikes, underreact to memory pressure, or oscillate between suboptimal states. We introduce the Scaling Plane, a two-dimensional model in which each distributed database configuration is represented as a point (H, V), with H denoting node count and V a vector of resources. Over this plane, we define smooth approximations of latency, throughput, coordination overhead, and monetary cost, providing a unified view of performance trade-offs. We show analytically and empirically that optimal scaling trajectories frequently lie along diagonal paths: sequences of joint horizontal and vertical adjustments that simultaneously exploit cluster parallelism and per-node improvements. To compute such actions, we propose DIAGONALSCALE, a discrete local-search algorithm that evaluates horizontal, vertical, and diagonal moves in the Scaling Plane and selects the configuration minimizing a multi-objective function subject to SLA constraints. Using synthetic surfaces, microbenchmarks, and experiments on distributed SQL and KV systems, we demonstrate that diagonal scaling reduces p95 latency by up to 40 percent, lowers cost-per-query by up to 37 percent, and reduces rebalancing by 2 to 5 times compared to horizontal-only and vertical-only autoscaling. Our results highlight the need for multi-dimensional scaling models and provide a foundation for next-generation autoscaling in cloud database systems.
翻译:现代云数据库将缩放呈现为二元决策:通过增加节点实现横向扩展(scale-out),或通过提升单节点资源实现纵向扩展(scale-up)。这种一维视角具有局限性,因为数据库性能、成本与协调开销源于横向弹性与单节点CPU、内存、网络带宽及存储IOPS的联合交互作用。因此,系统常对负载峰值反应过度、对内存压力反应不足,或在次优状态间振荡。我们引入缩放平面(Scaling Plane),这是一个二维模型,其中每个分布式数据库配置表示为点(H, V),H代表节点数量,V为资源向量。在该平面上,我们定义了延迟、吞吐量、协调开销及货币成本的平滑近似函数,为性能权衡提供了统一视图。我们通过分析和实证表明,最优缩放轨迹常沿对角线路径分布:即同时利用集群并行性与单节点改进的联合横向与纵向调整序列。为计算此类操作,我们提出DIAGONALSCALE算法——一种离散局部搜索算法,在缩放平面中评估横向、纵向及对角线移动,并选择在满足SLA约束下最小化多目标函数的配置。通过合成曲面、微基准测试及在分布式SQL与KV系统上的实验,我们证明相较于纯横向或纯纵向自动缩放,对角线缩放可将p95延迟降低达40%、单查询成本降低达37%,并将重平衡次数减少2至5倍。我们的研究结果凸显了多维缩放模型的必要性,并为云数据库系统中下一代自动缩放技术奠定了基础。