控制基于树的聚合中的假分割率 (Controlling the False Split Rate in Tree-Based Aggregation) - 专知论文

会员服务 ·

0

entity · 控制器 · 样例 · 块 · 统计方法 ·

2021 年 8 月 11 日

Controlling the False Split Rate in Tree-Based Aggregation

翻译：控制基于树的聚合中的假分割率

Simeng Shao,Jacob Bien,Adel Javanmard

from arxiv, 47 pages

In many domains, data measurements can naturally be associated with the leaves of a tree, expressing the relationships among these measurements. For example, companies belong to industries, which in turn belong to ever coarser divisions such as sectors; microbes are commonly arranged in a taxonomic hierarchy from species to kingdoms; street blocks belong to neighborhoods, which in turn belong to larger-scale regions. The problem of tree-based aggregation that we consider in this paper asks which of these tree-defined subgroups of leaves should really be treated as a single entity and which of these entities should be distinguished from each other. We introduce the "false split rate", an error measure that describes the degree to which subgroups have been split when they should not have been. We then propose a multiple hypothesis testing algorithm for tree-based aggregation, which we prove controls this error measure. We focus on two main examples of tree-based aggregation, one which involves aggregating means and the other which involves aggregating regression coefficients. We apply this methodology to aggregate stocks based on their volatility and to aggregate neighborhoods of New York City based on taxi fares.

翻译：在许多领域,数据测量自然可以与树叶相联系,表达这些测量之间的关系。例如,公司属于工业,而工业又属于部门等日益粗糙的分区;微生物通常是按物种到王国的分类等级排列的;街道街区属于社区,而社区又属于大面积区域。我们本文中考虑的基于树的集合问题,即树上的树块问题,我们想问这些树上界定的树叶分组中哪一个真正应该作为单一实体对待,哪些实体应该相互区别。我们引入了“假分裂率”这一错误计量标准,它描述了亚群在不应该进行分类时被分割的程度。我们然后提出了基于树的集合的多重假设算法,我们证明可以控制这种错误计量。我们侧重于基于树的集合的两个主要例子,一个是集成手段,另一个是累积回归系数。我们根据树块的波动性对总储量和根据出租车票价对纽约市的集合区进行这一方法。

0

相关内容

entity

经济学中的数据科学：机器学习与深度学习方法

经济学中的数据科学：机器学习与深度学习方法

专知会员服务

27+阅读 · 2020年10月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

个性化推荐系统技术进展

个性化推荐系统技术进展

专知会员服务

66+阅读 · 2020年8月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

78+阅读 · 2020年4月24日

【综述：心理学、神经科学和机器学习中的注意力】《Attention in Psychology, Neuroscience, and Machine Learning | Frontiers in Computational Neuroscience》

【综述：心理学、神经科学和机器学习中的注意力】《Attention in Psychology, Neuroscience, and Machine Learning | Frontiers in Computational Neuroscience》

专知会员服务

42+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Legion 一款网络渗透工具

Legion 一款网络渗透工具

黑白之道

6+阅读 · 2019年6月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

3+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Approximating Familywise Error Rate for Correlated Normal

Arxiv

0+阅读 · 2021年10月11日

Controllable Recommenders using Deep Generative Models and Disentanglement

Arxiv

0+阅读 · 2021年10月11日

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

Arxiv

0+阅读 · 2021年10月11日

Many Proxy Controls

Many Proxy Controls

Arxiv

0+阅读 · 2021年10月8日

Unveiling the Power of Mixup for Stronger Classifiers

Arxiv

0+阅读 · 2021年10月8日

A Design-Based Perspective on Synthetic Control Methods

Arxiv

0+阅读 · 2021年10月7日

Principal Neighbourhood Aggregation for Graph Nets

Principal Neighbourhood Aggregation for Graph Nets

Arxiv

17+阅读 · 2020年6月7日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Arxiv

5+阅读 · 2018年5月27日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

经济学中的数据科学：机器学习与深度学习方法

经济学中的数据科学：机器学习与深度学习方法

专知会员服务

27+阅读 · 2020年10月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

个性化推荐系统技术进展

个性化推荐系统技术进展

专知会员服务

66+阅读 · 2020年8月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【实用书】流数据处理，Streaming Data，219页pdf

【实用书】流数据处理，Streaming Data，219页pdf

专知会员服务

78+阅读 · 2020年4月24日

【综述：心理学、神经科学和机器学习中的注意力】《Attention in Psychology, Neuroscience, and Machine Learning | Frontiers in Computational Neuroscience》

【综述：心理学、神经科学和机器学习中的注意力】《Attention in Psychology, Neuroscience, and Machine Learning | Frontiers in Computational Neuroscience》

专知会员服务

42+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

Legion 一款网络渗透工具

Legion 一款网络渗透工具

黑白之道

6+阅读 · 2019年6月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

3+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Approximating Familywise Error Rate for Correlated Normal

Arxiv

0+阅读 · 2021年10月11日

Controllable Recommenders using Deep Generative Models and Disentanglement

Arxiv

0+阅读 · 2021年10月11日

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

Arxiv

0+阅读 · 2021年10月11日

Many Proxy Controls

Many Proxy Controls

Arxiv

0+阅读 · 2021年10月8日

Unveiling the Power of Mixup for Stronger Classifiers

Arxiv

0+阅读 · 2021年10月8日

A Design-Based Perspective on Synthetic Control Methods

Arxiv

0+阅读 · 2021年10月7日

Principal Neighbourhood Aggregation for Graph Nets

Principal Neighbourhood Aggregation for Graph Nets

Arxiv

17+阅读 · 2020年6月7日

Hierarchical Generative Modeling for Controllable Speech Synthesis

Hierarchical Generative Modeling for Controllable Speech Synthesis

Arxiv

3+阅读 · 2018年12月27日

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Arxiv

5+阅读 · 2018年5月27日

Controllable Generative Adversarial Network

Arxiv

5+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员