分散的连续分发和Fenchel-Fenchel-Young损失 (Sparse Continuous Distributions and Fenchel-Young Losses)

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent works on sparse alternatives to softmax (e.g. sparsemax, $\alpha$-entmax, and fusedmax) and corresponding losses, which have varying support. This paper expands that line of work in several directions: first, it extends $\Omega$-regularized prediction maps and Fenchel-Young losses to arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain "deformed exponential families," which include $\alpha$-entmax and sparsemax ($\alpha$ = 2) as particular cases. For quadratic energy functions in continuous domains, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using them, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

翻译：智能家庭被广泛用于机器学习; 包括许多在连续和离散域( 例如高斯、迪里赫莱特、 Posson ) 的分布。每个家族的分布都有固定的支持。相反, 在有限的域中, 最近有关于微量家庭( 例如稀释、 $alpha- entmax 和 subddmax) 的稀疏替代物的作品, 以及相应的损失, 其支持程度不一。本文将工作线扩展为几个方向 : 首先, 将美元\ Omega$ 的常规变异预测图和 Fenchel- Young 的损失扩展为任意域( 可能可以令人相信无限或持续支持 ) 。对于线性家庭来说, Fencheltel- Young 损失最小化相当于统计的瞬间匹配时间, 直角化家庭的基本属性。当 $\ mega 以我们为时, 以美元为正值, 我们得到“ 直径直径直立的直径直径直径直系家庭 ”, 包括美元直径直径直径直径直方的分布直径直方直方。。直方直方直方。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日