We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of fatness of the truncation set $S$, under which truncated samples reveal enough information about the true distribution. We show that if the truncation set is sufficiently fat, samples from the true distribution can be generated from truncated samples. A stunning consequence is that virtually any statistical task (e.g., learning in total variation distance, parameter estimation, uniformity or identity testing) that can be performed efficiently for Boolean product distributions, can also be performed from truncated samples, with a small increase in sample complexity. We generalize our approach to ranking distributions over $d$ alternatives, where we show how fatness implies efficient parameter estimation of Mallows models from truncated samples. Exploring the limits of learning discrete models from truncated samples, we identify three natural conditions that are necessary for efficient identifiability: (i) the truncation set $S$ should be rich enough; (ii) $S$ should be accessible through membership queries; and (iii) the truncation by $S$ should leave enough randomness in all directions. By carefully adapting the Stochastic Gradient Descent approach of (Daskalakis et al., FOCS 2018), we show that these conditions are also sufficient for efficient learning of truncated Boolean product distributions.


翻译:我们研究以美元维度估算布林产品分布参数的问题,当样品通过会籍或触角可访问到的一组 $S\ subset = = 0, 1 ⁇ dd$ 时,当标本被一个固定的 $S = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

0
下载
关闭预览

相关内容

机器学习入门的经验与建议
专知会员服务
94+阅读 · 2019年10月10日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7
中国图象图形学学会CSIG
0+阅读 · 2021年11月15日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6
中国图象图形学学会CSIG
2+阅读 · 2021年11月12日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4
中国图象图形学学会CSIG
0+阅读 · 2021年11月10日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3
中国图象图形学学会CSIG
0+阅读 · 2021年11月9日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2
中国图象图形学学会CSIG
0+阅读 · 2021年11月8日
国家自然科学基金
1+阅读 · 2014年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2009年12月31日
VIP会员
相关资讯
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7
中国图象图形学学会CSIG
0+阅读 · 2021年11月15日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6
中国图象图形学学会CSIG
2+阅读 · 2021年11月12日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4
中国图象图形学学会CSIG
0+阅读 · 2021年11月10日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3
中国图象图形学学会CSIG
0+阅读 · 2021年11月9日
【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2
中国图象图形学学会CSIG
0+阅读 · 2021年11月8日
相关基金
国家自然科学基金
1+阅读 · 2014年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2009年12月31日
Top
微信扫码咨询专知VIP会员