Information theory is of importance to machine learning, but the notation for information-theoretic quantities is sometimes opaque. The right notation can convey valuable intuitions and concisely express new ideas. We propose such a notation for machine learning users and expand it to include information-theoretic quantities between observed outcomes (events) and random variables. To demonstrate the value of our notation, first, we apply it to elegantly prove a version of Stirling's approximation for binomial coefficients mentioned by MacKay. Second, we apply the notation to a popular information-theoretic acquisition function in Bayesian active learning which selects the most informative (unlabelled) samples to be labelled by an expert and extend this acquisition function to the core-set problem, which consists of selecting the most informative samples \emph{given} the labels.
翻译:信息理论对机器学习很重要, 但对信息理论数量的批注有时是不透明的。 正确的标注可以传达有价值的直觉和简明地表达新的想法。 我们建议机器学习用户使用这样的标注, 并将其扩展为包括观察结果( 活动) 和随机变量之间的信息理论数量。 为了展示我们的标注价值, 首先, 我们应用它来优雅地证明斯特林对麦凯提到的二进制系数的近似值。 其次, 我们将标注应用到巴伊西亚积极学习中流行的信息理论获取功能中, 它选择了专家标注的最丰富( 未贴标签的)样本, 并将获取功能扩大到核心设置问题, 核心设置问题包括选择信息最丰富的样本\ emph{ give} 标签 。