华塞特核心集在Wasserstein分布稳健优化问题中的应用 (Coresets for Wasserstein Distributionally Robust Optimization Problems) - 专知论文

会员服务 ·

0

稳健优化 · 稳健 · 优化问题 · 不确定数据 · 对偶性 ·

2023 年 4 月 4 日

Coresets for Wasserstein Distributionally Robust Optimization Problems

翻译：华塞特核心集在Wasserstein分布稳健优化问题中的应用

Ruomin Huang,Jiawei Huang,Wenjie Liu,Hu Ding

Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the $\epsilon$-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments.

翻译：Wasserstein分布稳健优化（WDRO）是一种流行的模型，可增强具有不确定数据的机器学习的稳健性。然而，在实践中，WDRO的复杂性可能会阻碍其应用，因为解决其“极小极大”的表述需要大量计算。最近，已经开发了几种适用于某些特定机器学习任务（例如，逻辑回归）的快速WDRO训练算法。然而，据我们所知，设计高效的算法来解决一般大规模WDRO问题的研究仍然非常有限。核心集是一种重要的工具，用于压缩大数据集，因此它已经被广泛应用于减少许多优化问题的计算复杂度。在本文中，我们介绍了一个统一的框架来构建一般WDRO问题的ε-coreset。虽然由于模糊数据的不确定性问题，获取WDRO的传统核心集很具有挑战性，但我们表明，我们可以通过使用WDRO的强对偶性质计算出一个“对偶核心集”。此外，由对偶核心集引入的误差可以在原始WDRO目标上得到理论保证。为了构建对偶核心集，我们提出了一种新颖的网格抽样方法，特别适用于WDRO的对偶格式。最后，我们实现了核心集方法，并在实验中说明了其对几个WDRO问题的有效性。

0

相关内容

稳健优化

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【斯坦福大学博士论文】凸优化和图算法的新基元，404页pdf

【斯坦福大学博士论文】凸优化和图算法的新基元，404页pdf

专知会员服务

62+阅读 · 2022年8月18日

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

NeurIPS 2022 | 量子算法用于采样对数凹分布和估计归一化常数

NeurIPS 2022 | 量子算法用于采样对数凹分布和估计归一化常数

PaperWeekly

0+阅读 · 2022年10月18日

再谈人脸识别损失函数综述

再谈人脸识别损失函数综述

人工智能前沿讲习班

14+阅读 · 2019年5月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】RoomNet：端到端房屋布局估计

【泡泡一分钟】RoomNet：端到端房屋布局估计

泡泡机器人SLAM

18+阅读 · 2018年12月4日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

马普与Google Brain新研究：Wasserstein自动编码器

马普与Google Brain新研究：Wasserstein自动编码器

论智

27+阅读 · 2018年2月10日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

图像复原中非凸稀疏优化问题的快速算法

国家自然科学基金

0+阅读 · 2015年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

非光滑矩阵优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大规模半定规划问题的信赖域算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

污泥重金属在林地土壤中的迁移转化机制及其环境风险：凋落物的影响

国家自然科学基金

0+阅读 · 2012年12月31日

向量变分不等式投影型方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于谱流形降维的大规模进化多目标优化研究

国家自然科学基金

0+阅读 · 2011年12月31日

非光滑优化有效数值方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

椭圆曲线密码学算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

Distributed CONGEST Algorithms against Mobile Adversaries

Arxiv

0+阅读 · 2023年5月23日

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Arxiv

0+阅读 · 2023年5月23日

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月22日

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Arxiv

0+阅读 · 2023年5月22日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Arxiv

0+阅读 · 2023年5月22日

Block Coordinate Plug-and-Play Methods for Blind Inverse Problems

Arxiv

0+阅读 · 2023年5月22日

Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems

Arxiv

0+阅读 · 2023年5月19日

Distributed MIS with Low Energy and Time Complexities

Arxiv

0+阅读 · 2023年5月19日

Distributionally Robust Bayesian Optimization with $φ$-divergences

Arxiv

0+阅读 · 2023年5月19日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

VIP会员

文章信息

相关主题

不确定数据

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【斯坦福大学博士论文】凸优化和图算法的新基元，404页pdf

【斯坦福大学博士论文】凸优化和图算法的新基元，404页pdf

专知会员服务

62+阅读 · 2022年8月18日

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

NeurIPS 2022 | 量子算法用于采样对数凹分布和估计归一化常数

NeurIPS 2022 | 量子算法用于采样对数凹分布和估计归一化常数

PaperWeekly

0+阅读 · 2022年10月18日

再谈人脸识别损失函数综述

再谈人脸识别损失函数综述

人工智能前沿讲习班

14+阅读 · 2019年5月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】RoomNet：端到端房屋布局估计

【泡泡一分钟】RoomNet：端到端房屋布局估计

泡泡机器人SLAM

18+阅读 · 2018年12月4日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

马普与Google Brain新研究：Wasserstein自动编码器

马普与Google Brain新研究：Wasserstein自动编码器

论智

27+阅读 · 2018年2月10日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Distributed CONGEST Algorithms against Mobile Adversaries

Arxiv

0+阅读 · 2023年5月23日

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Arxiv

0+阅读 · 2023年5月23日

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月22日

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

Arxiv

0+阅读 · 2023年5月22日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Arxiv

0+阅读 · 2023年5月22日

Block Coordinate Plug-and-Play Methods for Blind Inverse Problems

Arxiv

0+阅读 · 2023年5月22日

Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems

Arxiv

0+阅读 · 2023年5月19日

Distributed MIS with Low Energy and Time Complexities

Arxiv

0+阅读 · 2023年5月19日

Distributionally Robust Bayesian Optimization with $φ$-divergences

Arxiv

0+阅读 · 2023年5月19日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

相关基金

图像复原中非凸稀疏优化问题的快速算法

国家自然科学基金

0+阅读 · 2015年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

非光滑矩阵优化问题的理论与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

大规模半定规划问题的信赖域算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

污泥重金属在林地土壤中的迁移转化机制及其环境风险：凋落物的影响

国家自然科学基金

0+阅读 · 2012年12月31日

向量变分不等式投影型方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于谱流形降维的大规模进化多目标优化研究

国家自然科学基金

0+阅读 · 2011年12月31日

非光滑优化有效数值方法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

椭圆曲线密码学算法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员