Monads are commonplace in computer science, and can be composed using Beck's distributive laws. Unfortunately, finding distributive laws can be extremely difficult and error-prone. The literature contains some general principles for constructing distributive laws. However, until now there have been no such techniques for establishing when no distributive law exists. We present three families of theorems for showing when there can be no distributive law between two monads. The first widely generalizes a counterexample attributed to Plotkin. It covers all the previous known no-go results for specific pairs of monads, and includes many new results. The second and third families are entirely novel, encompassing various new practical situations. For example, they negatively resolve the open question of whether the list monad distributes over itself, reveal a previously unobserved error in the literature, and confirm a conjecture made by Beck himself in his first paper on distributive laws. In addition, we establish conditions under which there can be at most one possible distributive law between two monads, proving various known distributive laws to be unique.

0
下载
关闭预览

相关内容

In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* \in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $\tau$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most $o(\sqrt{n})$ and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map $\tau$ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference.

0
0
下载
预览

In distributed applications, Brewer's CAP theorem tells us that when networks become partitioned, there is a tradeoff between consistency and availability. Consistency is agreement on the values of shared variables across a system, and availability is the ability to respond to reads and writes accessing those shared variables. We quantify these concepts, giving numerical values to inconsistency and unavailability. Recognizing that network partitioning is not an all-or-nothing proposition, we replace the P in CAP with L, a numerical measure of apparent latency, and derive the CAL theorem, an algebraic relation between inconsistency, unavailability, and apparent latency. This relation shows that if latency becomes unbounded (e.g., the network becomes partitioned), then one of inconsistency and unavailability must also become unbounded, and hence the CAP theorem is a special case of the CAL theorem. We describe two distributed coordination mechanisms, which we have implemented as an extension of the Lingua Franca coordination language, that support arbitrary tradeoffs between consistency and availability as apparent latency varies. With centralized coordination, inconsistency remains bounded by a chosen numerical value at the cost that unavailability becomes unbounded under network partitioning. With decentralized coordination, unavailability remains bounded by a chosen numerical quantity at the cost that inconsistency becomes unbounded under network partitioning. Our centralized coordination mechanism is an extension of techniques that have historically been used for distributed simulation, an application where consistency is paramount. Our decentralized coordination mechanism is an extension of techniques that have been used in distributed databases when availability is paramount.

0
0
下载
预览

A Lattice is a partially ordered set where both least upper bound and greatest lower bound of any pair of elements are unique and exist within the set. K\"{o}tter and Kschischang proved that codes in the linear lattice can be used for error and erasure-correction in random networks. Codes in the linear lattice have previously been shown to be special cases of codes in modular lattices. Two well known classifications of modular lattices are geometric and distributive lattices. We have identified the unique criterion which makes a geometric lattice distributive, thus characterizing all finite geometric distributive lattices. Our characterization helps to prove a conjecture regarding the maximum size of a distributive sublattice of a finite geometric lattice and identify the maximal case. The Whitney numbers of the class of geometric distributive lattices are also calculated. We present a few other applications of this unique characterization to derive certain results regarding linearity and complements in the linear lattice.

0
0
下载
预览

From the fundamental theorem of screening (FTS) we obtain the following mathematical relationship relaying the pre-test probability of disease $\phi$ to the positive predictive value $\rho(\phi)$ of a screening test: $\displaystyle\lim_{\varepsilon \to 2}{\displaystyle \int_{0}^{1}}{\rho(\phi)d\phi} = 1$ where $\varepsilon$ is the screening coefficient - the sum of the sensitivity ($a$) and specificity ($b$) parameters of the test in question. However, given the invariant points on the screening plane, identical values of $\varepsilon$ may yield different shapes of the screening curve since $\varepsilon$ does not respect traditional commutative properties. In order to compare the performance between two screening curves with identical $\varepsilon$ values, we derive two geometric definitions of the positive likelihood ratio (LR+), defined as the likelihood of a positive test result in patients with the disease divided by the likelihood of a positive test result in patients without the disease, which helps distinguish the performance of both screening tests. The first definition uses the angle $\beta$ created on the vertical axis by the line between the origin invariant and the prevalence threshold $\phi_e$ such that $LR+ = \frac{a}{1-b} = cot^2{(\beta)}$. The second definition projects two lines $(y_1,y_2)$ from the prevalence threshold to the invariant points and defines the LR+ as the ratio of its derivatives $\frac{dy_1}{dx}$ and $\frac{dy_2}{dx}$. Using the concepts of the prevalence threshold and the invariant points on the screening plane, the work herein presented provides a new geometric definition of the positive likelihood ratio (LR+) throughout the prevalence spectrum and describes a formal measure to compare the performance of two screening tests whose screening coefficients $\varepsilon$ are equal.

0
0
下载
预览

We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization. We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the solution, as well as new accelerated versions, using varying stepsizes. In addition, we propose distributed variants of these algorithms, which can be accelerated as well. While most existing results are ergodic, our nonergodic results significantly broaden our understanding of primal-dual optimization algorithms.

0
0
下载
预览

In this paper, we study a distributed learning problem constrained by constant communication bits. Specifically, we consider the distributed hypothesis testing (DHT) problem where two distributed nodes are constrained to transmit a constant number of bits to a central decoder. In such cases, we show that in order to achieve the optimal error exponents, it suffices to consider the empirical distributions of observed data sequences and encode them to the transmission bits. With such a coding strategy, we develop a geometric approach in the distribution spaces and characterize the optimal schemes. In particular, we show the optimal achievable error exponents and coding schemes for the following cases: (i) both nodes can transmit $\log_23$ bits; (ii) one of the nodes can transmit $1$ bit, and the other node is not constrained; (iii) the joint distribution of the nodes are conditionally independent given one hypothesis. Furthermore, we provide several numerical examples for illustrating the theoretical results. Our results provide theoretical guidance for designing practical distributed learning rules, and the developed approach also reveals new potentials for establishing error exponents for DHT with more general communication constraints.

0
0
下载
预览

A Fr\'echet mean of a random variable $Y$ with values in a metric space $(\mathcal Q, d)$ is an element of the metric space that minimizes $q \mapsto \mathbb E[d(Y,q)^2]$. This minimizer may be non-unique. We study strong laws of large numbers for sets of generalized Fr\'echet means. Following generalizations are considered: the minimizers of $\mathbb E[d(Y, q)^\alpha]$ for $\alpha > 0$, the minimizers of $\mathbb E[H(d(Y, q))]$ for integrals $H$ of non-decreasing functions, and the minimizers of $\mathbb E[\mathfrak c(Y, q)]$ for a quite unrestricted class of cost functions $\mathfrak c$. We show convergence of empirical versions of these sets in outer limit and in one-sided Hausdorff distance. The derived results require only minimal assumptions.

0
0
下载
预览

Support constrained generator matrices for linear codes have been found applications in multiple access networks and weakly secure document exchange. The necessary and sufficient conditions for the existence of Reed-Solomon codes with support constrained generator matrices were conjectured by Dau, Song, Yuen and Hassibi. This conjecture is called the GM-MDS conjecture and finally proved recently in independent works of Lovett and Yildiz-Hassibi. From their conjecture support constrained generator matrices for MDS codes are existent over linear size small fields. In this paper we propose a natural generalized conjecture for the support constrained matrices based on the generalized Hamming weights (SCGM-GHW conjecture). The GM-MDS conjecture can be thought as a very special case of our SCGM-GHW conjecture for linear $1$-MDS codes. We investigate the support constrained generator matrices for some linear codes such as $2$-MDS codes, first order Reed-Muller codes, algebraic-geometric codes from elliptic curves from the view of our SCGM-GHW conjecture. In particular the direct generalization of the GM-MDS conjecture about $1$-MDS codes to $2$-MDS codes is not true. For linear $2$-MDS codes only cardinality-based constraints on subset systems are not sufficient for the purpose that these subsets are in the zero coordinate position sets of rows in generator matrices.

0
0
下载
预览

Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.

0
8
下载
预览

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

0
3
下载
预览
小贴士
相关主题
相关论文
Nikolaj Thams,Sorawit Saengkyongam,Niklas Pfister,Jonas Peters
0+阅读 · 9月16日
Edward A. Lee,Soroush Bateni,Shaokai Lin,Marten Lohstroh,Christian Menard
0+阅读 · 9月16日
Laurent Condat,Grigory Malinovsky,Peter Richtárik
0+阅读 · 9月14日
Xiangxiang Xu,Shao-Lun Huang
0+阅读 · 9月14日
Mohammad Mehrabi,Adel Javanmard,Ryan A. Rossi,Anup Rao,Tung Mai
8+阅读 · 1月15日
Maria-Florina Balcan,Yi Li,David P. Woodruff,Hongyang Zhang
3+阅读 · 2018年10月18日
相关VIP内容
专知会员服务
71+阅读 · 2月26日
专知会员服务
56+阅读 · 2020年11月20日
因果图,Causal Graphs,52页ppt
专知会员服务
141+阅读 · 2020年4月19日
强化学习最新教程,17页pdf
专知会员服务
66+阅读 · 2019年10月11日
【新书】Python编程基础,669页pdf
专知会员服务
101+阅读 · 2019年10月10日
机器学习入门的经验与建议
专知会员服务
40+阅读 · 2019年10月10日
最新BERT相关论文清单,BERT-related Papers
专知会员服务
36+阅读 · 2019年9月29日
相关资讯
【TED】生命中的每一年的智慧
英语演讲视频每日一推
6+阅读 · 2019年1月29日
强化学习的Unsupervised Meta-Learning
CreateAMind
7+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
29+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
Disentangled的假设的探讨
CreateAMind
7+阅读 · 2018年12月10日
已删除
将门创投
10+阅读 · 2018年5月2日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
Auto-Encoding GAN
CreateAMind
5+阅读 · 2017年8月4日
Top