We investigate the problem of minimizing the excess generalization error with respect to the best expert prediction in a finite family in the stochastic setting, under limited access to information. We assume that the learner only has access to a limited number of expert advices per training round, as well as for prediction. Assuming that the loss function is Lipschitz and strongly convex, we show that if we are allowed to see the advice of only one expert per round for T rounds in the training phase, or to use the advice of only one expert for prediction in the test phase, the worst-case excess risk is $\Omega$(1/ $\sqrt$ T) with probability lower bounded by a constant. However, if we are allowed to see at least two actively chosen expert advices per training round and use at least two experts for prediction, the fast rate O(1/T) can be achieved. We design novel algorithms achieving this rate in this setting, and in the setting where the learner has a budget constraint on the total number of observed expert advices, and give precise instance-dependent bounds on the number of training rounds and queries needed to achieve a given generalization error precision.

### 相关内容

Business process monitoring approaches have thus far mainly focused on monitoring the execution of a process with respect to a single process model. However, in some cases it is necessary to consider multiple process specifications simultaneously. In addition, these specifications can be procedural, declarative, or a combination of both. For example, in the medical domain, a clinical guideline describing the treatment of a specific disease cannot account for all possible co-factors that can coexist for a specific patient and therefore additional constraints may need to be considered. In some cases, these constraints may be incompatible with clinical guidelines, therefore requiring the violation of either the guidelines or the constraints. In this paper, we propose a solution for monitoring the interplay of hybrid process specifications expressed as a combination of (data-aware) Petri nets and temporal logic rules. During the process execution, if these specifications are in conflict with each other, it is possible to violate some of them. The monitoring system is equipped with a violation cost model according to which the system can recommend the next course of actions in a way that would either avoid possible violations or minimize the total cost of violations.

Modern longitudinal studies collect feature data at many timepoints, often of the same order of sample size. Such studies are typically affected by {dropout} and positivity violations. We tackle these problems by generalizing effects of recent incremental interventions (which shift propensity scores rather than set treatment values deterministically) to accommodate multiple outcomes and subject dropout. We give an identifying expression for incremental intervention effects when dropout is conditionally ignorable (without requiring treatment positivity), and derive the nonparametric efficiency bound for estimating such effects. Then we present efficient nonparametric estimators, showing that they converge at fast parametric rates and yield uniform inferential guarantees, even when nuisance functions are estimated flexibly at slower rates. We also study the variance ratio of incremental intervention effects relative to more conventional deterministic effects in a novel infinite time horizon setting, where the number of timepoints can grow with sample size, and show that incremental intervention effects yield near-exponential gains in statistical precision in this setup. Finally we conclude with simulations and apply our methods in a study of the effect of low-dose aspirin on pregnancy outcomes.

We establish generalization error bounds for stochastic gradient Langevin dynamics (SGLD) with constant learning rate under the assumptions of dissipativity and smoothness, a setting that has received increased attention in the sampling/optimization literature. Unlike existing bounds for SGLD in non-convex settings, ours are time-independent and decay to zero as the sample size increases. Using the framework of uniform stability, we establish time-independent bounds by exploiting the Wasserstein contraction property of the Langevin diffusion, which also allows us to circumvent the need to bound gradients using Lipschitz-like assumptions. Our analysis also supports variants of SGLD that use different discretization methods, incorporate Euclidean projections, or use non-isotropic noise.

The Unlimited Sensing Framework (USF) was recently introduced to overcome the sensor saturation bottleneck in conventional digital acquisition systems. At its core, the USF allows for high-dynamic-range (HDR) signal reconstruction by converting a continuous-time signal into folded, low-dynamic-range (LDR), modulo samples. HDR reconstruction is then carried out by algorithmic unfolding of the folded samples. In hardware, however, implementing an ideal modulo folding requires careful calibration, analog design and high precision. At the interface of theory and practice, this paper explores a computational sampling strategy that relaxes strict hardware requirements by compensating them via a novel, mathematically guaranteed recovery method. Our starting point is a generalized model for USF. The generalization relies on two new parameters modeling hysteresis and folding transients} in addition to the modulo threshold. Hysteresis accounts for the mismatch between the reset threshold and the amplitude displacement at the folding time and we refer to a continuous transition period in the implementation of a reset as folding transient. Both these effects are motivated by our hardware experiments and also occur in previous, domain-specific applications. We show that the effect of hysteresis is beneficial for the USF and we leverage it to derive the first recovery guarantees in the context of our generalized USF model. Additionally, we show how the proposed recovery can be directly generalized for the case of lower sampling rates. Our theoretical work is corroborated by hardware experiments that are based on a hysteresis enabled, modulo ADC testbed comprising off-the-shelf electronic components. Thus, by capitalizing on a collaboration between hardware and algorithms, our paper enables an end-to-end pipeline for HDR sampling allowing more flexible hardware implementations.

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.

Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe

In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will subsequently participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation.

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution of the foreground more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and enforces diversity across hypotheses. Our multiple-hypothesesbased anomaly detection framework allows the reliable identification of out-of-distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%.

Networks provide a powerful formalism for modeling complex systems, by representing the underlying set of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once; for example, communication within a group rather than person-to-person, collaboration among a team rather than a pair of co-authors, or biological interaction between a set of molecules rather than just two. We refer to these type of simultaneous interactions on sets of more than two nodes as higher-order interactions; they are ubiquitous, but the empirical study of them has lacked a general framework for evaluating higher-order models. Here we introduce such a framework, based on link prediction, a fundamental problem in network analysis. The traditional link prediction problem seeks to predict the appearance of new links in a network, and here we adapt it to predict which (larger) sets of elements will have future interactions. We study the temporal evolution of 19 datasets from a variety of domains, and use our higher-order formulation of link prediction to assess the types of structural features that are most predictive of new multi-way interactions. Among our results, we find that different domains vary considerably in their distribution of higher-order structural parameters, and that the higher-order link prediction problem exhibits some fundamental differences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the appearance of new interactions.

Anti Alman,Fabrizio Maria Maggi,Marco Montali,Fabio Patrizi,Andrey Rivkin
0+阅读 · 11月25日
Kwangho Kim,Edward H. Kennedy,Ashley I. Naimi
0+阅读 · 11月25日
Tyler Farghly,Patrick Rebeschini
0+阅读 · 11月25日
Dorian Florescu,Felix Krahmer,Ayush Bhandari
0+阅读 · 11月24日
Zhihang Yuan,Chenhao Xue,Yiqi Chen,Qiang Wu,Guangyu Sun
0+阅读 · 11月24日
Sai Rajeswar,Pau Rodriguez,Soumye Singhal,David Vazquez,Aaron Courville
0+阅读 · 11月23日
Thomas Richardson,Yu Liu,James McQueen,Doug Hains
0+阅读 · 11月23日
Ziwei Ji,Nathan Srebro,Matus Telgarsky
4+阅读 · 7月1日
Duc Tam Nguyen,Zhongyu Lou,Michael Klar,Thomas Brox
6+阅读 · 2019年1月28日
Austin R. Benson,Rediet Abebe,Michael T. Schaub,Ali Jadbabaie,Jon Kleinberg
3+阅读 · 2018年2月20日

23+阅读 · 2019年10月17日

82+阅读 · 2019年10月12日

47+阅读 · 2019年10月10日

CreateAMind
8+阅读 · 2019年5月18日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
21+阅读 · 2019年1月4日
CreateAMind
32+阅读 · 2019年1月3日

10+阅读 · 2018年12月24日

54+阅读 · 2018年1月26日

4+阅读 · 2017年11月28日

24+阅读 · 2017年11月16日

24+阅读 · 2017年9月8日

3+阅读 · 2017年8月6日
Top