分散的连续分发和Fenchel-Fenchel-Young损失 (Sparse Continuous Distributions and Fenchel-Young Losses)

Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $\Omega$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain ``deformed exponential families,'' which include $\alpha$-entmax and sparsemax ($\alpha=2$) as particular cases. For quadratic energy functions, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

翻译：智能家庭被广泛用于机器学习, 包括许多在连续和离散域( 例如高斯、迪里赫莱特、波松, 以及通过软麦变形的绝对分布 ) 上分布。每个这些家族的分布都有固定的支持。相反, 在有限的域中, 最近关于软麦( (比如, 稀释, $=alpha- entmax) 的稀疏替代方法的工作, 导致了不同支持的分布。本文根据一些技术贡献, 开发了与连续分配( 比如, 高斯、迪里赫莱特、普罗森特、普罗森特、直立度预测地图地图和Fenchel- 青年对任意区域的损失( 可能是无限的或持续。对于线性化家庭而言, 最小化 Fencheltel- Young 损失相当于统计的瞬间匹配度, 直发式家庭的基本属性。当 $\ Omegatrequestational, livestal- developtical matical- developyal- demotional- demotional- demotions) exal- extial extial demotional demotional demotional demotional demotions, exal disal disal disal dism exal disal disal disal disal a exmisal develdiversal develmental a extial a extial devitional devitional devitionaldal devitionalmental develmental exmation.

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日