GPU-加速加速的对子模量Expemplar集群化的优化器软件评价 (GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar Clustering) - 专知论文

会员服务 ·

0

簇 · 优化器 · 查准率/准确率 · 可约的 · 泛函 ·

2021 年 1 月 21 日

GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar Clustering

翻译：GPU-加速加速的对子模量Expemplar集群化的优化器软件评价

Philipp-Jan Honysz,Sebastian Buschjäger,Katharina Morik

The optimization of submodular functions constitutes a viable way to perform clustering. Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable. Technically, submodular functions map subsets of data to real values, which indicate how "representative" a specific subset is. Optimal sets might then be used to partition the data space and to infer clusters. Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity. However, for practical applications, the particular real-time or wall-clock run-time is decisive. In this work, we present a novel way to evaluate this particular function on GPUs, which keeps the necessities of optimizers in mind and reduces wall-clock run-time. To discuss our GPU algorithm, we investigated both the impact of different run-time critical problem properties, like data dimensionality and the number of data points in a subset, and the influence of required floating-point precision. In reproducible experiments, our GPU algorithm was able to achieve competitive speedups of up to 72x depending on whether multi-threaded computation on CPUs was used for comparison and the type of floating-point precision required. Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.

翻译：优化子模块函数是一种可行的组合方式。强大的近似保障和可行的优化 w.r.t. t. 流动数据使这种组合法变得有利。在技术上, 子模块函数将数据子集映射成真实值, 表明“ 代表” 一个特定子集如何。最佳组合可以用来分割数据空间和推断组。 Exmplar 组合是可能的子模块功能之一, 但却具有高计算复杂性。但是, 对于实际应用来说, 特定的实时或墙上时钟运行时间是决定性的。在这项工作中, 我们提出了一个创新的方法来评估这个特定功能。在GPUs上, 它将保持优化者的需要, 并减少小时运行时间。为了讨论我们的 GPU 算法, 我们研究了不同运行时的关键问题特性的影响, 比如数据维度和子集中的数据点数量, 以及要求的浮动点精确度的影响。在可复制的实验中, 我们的 GPU 算法能够达到最多72x 的竞争性加速度, 取决于是否将优化者放在心上, MIPS 4 类的计算中, 需要进行大型的移动的C- 级的计算。

0

相关内容

【DeepMind】无归一化的高性能大规模图像识别

【DeepMind】无归一化的高性能大规模图像识别

专知会员服务

9+阅读 · 2021年2月14日

最新《图嵌入组合优化》综述论文，40页pdf

最新《图嵌入组合优化》综述论文，40页pdf

专知会员服务

78+阅读 · 2020年8月31日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

K-expectiles clustering

Arxiv

0+阅读 · 2021年3月16日

Energy Efficiency Maximization in the Uplink Delta-OMA Networks

Arxiv

0+阅读 · 2021年3月15日

Optimal Network Slicing for Service-Oriented Networks with Flexible Routing and Guaranteed E2E Latency

Arxiv

0+阅读 · 2021年3月15日

Accelerated Polynomial Evaluation and Differentiation at Power Series in Multiple Double Precision

Arxiv

0+阅读 · 2021年3月13日

Photonic Computing to Accelerate Data Processing in Wireless Communications

Arxiv

0+阅读 · 2021年3月12日

gIM: GPU Accelerated RIS-based Influence Maximization Algorithm

Arxiv

0+阅读 · 2021年3月12日

Power and sample size for cluster randomized and stepped wedge trials: Comparing estimates obtained by applying design effects or by direct estimation in GLMM

Arxiv

0+阅读 · 2021年3月12日

PLUME: Efficient 3D Object Detection from Stereo Images

Arxiv

0+阅读 · 2021年3月11日

Self-labelling via simultaneous clustering and representation learning

Self-labelling via simultaneous clustering and representation learning

Arxiv

3+阅读 · 2019年11月13日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

【DeepMind】无归一化的高性能大规模图像识别

【DeepMind】无归一化的高性能大规模图像识别

专知会员服务

9+阅读 · 2021年2月14日

最新《图嵌入组合优化》综述论文，40页pdf

最新《图嵌入组合优化》综述论文，40页pdf

专知会员服务

78+阅读 · 2020年8月31日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

K-expectiles clustering

Arxiv

0+阅读 · 2021年3月16日

Energy Efficiency Maximization in the Uplink Delta-OMA Networks

Arxiv

0+阅读 · 2021年3月15日

Optimal Network Slicing for Service-Oriented Networks with Flexible Routing and Guaranteed E2E Latency

Arxiv

0+阅读 · 2021年3月15日

Accelerated Polynomial Evaluation and Differentiation at Power Series in Multiple Double Precision

Arxiv

0+阅读 · 2021年3月13日

Photonic Computing to Accelerate Data Processing in Wireless Communications

Arxiv

0+阅读 · 2021年3月12日

gIM: GPU Accelerated RIS-based Influence Maximization Algorithm

Arxiv

0+阅读 · 2021年3月12日

Power and sample size for cluster randomized and stepped wedge trials: Comparing estimates obtained by applying design effects or by direct estimation in GLMM

Arxiv

0+阅读 · 2021年3月12日

PLUME: Efficient 3D Object Detection from Stereo Images

Arxiv

0+阅读 · 2021年3月11日

Self-labelling via simultaneous clustering and representation learning

Self-labelling via simultaneous clustering and representation learning

Arxiv

3+阅读 · 2019年11月13日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

微信扫码咨询专知VIP会员