A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,..,X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$. The KL divergence $\mathbf{d}_{\mathrm{KL}}$ provides a natural measure of prediction quality, but the majority of deep learning research looks only at the marginal predictions per input $X_t$. In this technical report we propose a scoring rule $\mathbf{d}_{\mathrm{KL}}^\tau$, parameterized by $\tau \in \mathcal{N}$ that evaluates the joint predictions at $\tau$ inputs simultaneously. We show that the commonly-used $\tau=1$ can be insufficient to drive good decisions in many settings of interest. We also show that, as $\tau$ grows, performing well according to $\mathbf{d}_{\mathrm{KL}}^\tau$ recovers universal guarantees for any possible decision. Finally, we provide problem-dependent guidance on the scale of $\tau$ for which our score provides sufficient guarantees for good performance.
翻译:对于任何智能系统来说,一个根本性的挑战就是预测:如果有一定的投入, $X_ 1,., X ⁇ tau$, 您可以预测结果$1,., Y ⁇ tau$。 KL差异 $mathbf{d ⁇ mathrm{KL}}}KL ⁇ $ 提供自然的预测质量量度, 但大部分深层次的学习研究只看每个投入的边际预测$X_ t$。 在本技术报告中,我们提出了一个评分规则$mathb{d ⁇ d ⁇ mathrm{KL ⁇ tau$, 参数为$tau / in\ mathcal{N}, 参数可以同时用$\tou$来评估联合预测结果。 我们显示, 通常使用的$\ tau=1$可能不足以推动许多利益环境下的良好决策。 我们还表明, $tau$( tau$) 正在增长, 运行良好, 并符合$mathb{d{d{d}K ⁇ tau$=tau$, 恢复任何可能的决定的普遍保证。 最后, 我们为$\taual desublegest consultsubleg for rog for we pressubre prosubleges for press for kedustration for $\ progrest rogreceus