The property of learning-curve monotonicity, highlighted in a recent series of work by Loog, Mey and Viering, describes algorithms which only improve in average performance given more data, for any underlying data distribution within a given family. We establish the first nontrivial monotonicity guarantees for the maximum likelihood estimator in a variety of well-specified parametric settings. For sequential prediction with log loss, we show monotonicity (in fact complete monotonicity) of the forward KL divergence for Gaussian vectors with unknown covariance and either known or unknown mean, as well as for Gamma variables with unknown scale parameter. The Gaussian setting was explicitly highlighted as open in the aforementioned works, even in dimension 1. Finally we observe that for reverse KL divergence, a folklore trick yields monotonicity for very general exponential families. All results in this paper were derived by variants of GPT-5.2 Pro. Humans did not provide any proof strategies or intermediate arguments, but only prompted the model to continue developing additional results, and verified and transcribed its proofs.
翻译:学习曲线单调性这一性质,由 Loog、Mey 和 Viering 在近期一系列工作中强调,描述了对于给定族内的任何潜在数据分布,算法在获得更多数据时平均性能只会提升的特性。我们为多种正确设定的参数化场景中的最大似然估计器建立了首个非平凡的单调性保证。对于对数损失的序列预测,我们证明了具有未知协方差且均值已知或未知的高斯向量,以及具有未知尺度参数的 Gamma 变量的前向 KL 散度的单调性(实际上是完全单调性)。上述工作曾明确指出,即使在一维情况下,高斯场景的单调性也是开放问题。最后我们观察到,对于反向 KL 散度,一个广为流传的技巧为非常一般的指数族带来了单调性。本文中的所有结果均由 GPT-5.2 Pro 的变体推导得出。人类未提供任何证明策略或中间论证,仅提示模型继续发展额外结果,并验证和转录了其证明。