We introduce a new family of energy-based probabilistic graphical models for efficient unsupervised learning. Its definition is motivated by the control of the spin-glass properties of the Ising model described by the weights of Boltzmann machines. We use it to learn the Bars and Stripes dataset of various sizes and the MNIST dataset, and show how they quickly achieve the performance offered by standard methods for unsupervised learning. Our results indicate that the standard initialization of Boltzmann machines with random weights equivalent to spin-glass models is an unnecessary bottleneck in the process of training. Furthermore, this new family allows for very easy access to low-energy configurations, which points to new, efficient training algorithms. The simplest variant of such algorithms approximates the negative phase of the log-likelihood gradient with no Markov chain Monte Carlo sampling costs at all, and with an accuracy sufficient to achieve good learning and generalization.
Intracranial hemorrhage occurs when blood vessels rupture or leak within the brain tissue or elsewhere inside the skull. It can be caused by physical trauma or by various medical conditions and in many cases leads to death. The treatment must be started as soon as possible, and therefore the hemorrhage should be diagnosed accurately and quickly. The diagnosis is usually performed by a radiologist who analyses a Computed Tomography (CT) scan containing a large number of cross-sectional images throughout the brain. Analysing each image manually can be very time-consuming, but automated techniques can help speed up the process. While much of the recent research has focused on solving this problem by using supervised machine learning algorithms, publicly-available training data remains scarce due to privacy concerns. This problem can be alleviated by unsupervised algorithms. In this paper, we propose a fully-unsupervised algorithm which is based on the mixture models. Our algorithm utilizes the fact that the properties of hemorrhage and healthy tissues follow different distributions, and therefore an appropriate formulation of these distributions allows us to separate them through an Expectation-Maximization process. In addition, our algorithm is able to adaptively determine the number of clusters such that all the hemorrhage regions can be found without including noisy voxels. We demonstrate the results of our algorithm on publicly-available datasets that contain all different hemorrhage types in various sizes and intensities, and our results are compared to earlier unsupervised and supervised algorithms. The results show that our algorithm can outperform the other algorithms with most hemorrhage types.
Data augmentation is an approach that can effectively improve the performance of multimodal machine learning. This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities. Different from conventional data augmentation approaches that apply low level operations with deterministic heuristics, our method proposes to learn an augmentation sampler that generates samples of the target modality conditioned on observed modalities in the variational auto-encoder framework. Additionally, the proposed model is able to quantify the confidence of augmented data by its generative probability, and can be jointly updated with a downstream pipeline. Experiments on Visual Question Answering tasks demonstrate the effectiveness of the proposed generative model, which is able to boost the strong UpDn-based models to the state-of-the-art performance.
Markov chain Monte Carlo methods for intractable likelihoods, such as the exchange algorithm, require simulations of the sufficient statistics at every iteration of the Markov chain, which often result in expensive computations. Surrogate models for the likelihood function have been developed to accelerate inference algorithms in this context. However, these surrogate models tend to be relatively inflexible, and often provide a poor approximation to the true likelihood function. In this article, we propose the use of a warped, gradient-enhanced, Gaussian process surrogate model for the likelihood function, which jointly models the sample means and variances of the sufficient statistics, and uses warping functions to capture covariance nonstationarity in the input parameter space. We show that both the consideration of nonstationarity and the inclusion of gradient information can be leveraged to obtain a surrogate model that outperforms the conventional stationary Gaussian process surrogate model when making inference, particularly in regions where the likelihood function exhibits a phase transition. We also show that the proposed surrogate model can be used to improve the effective sample size per unit time when embedded in exact inferential algorithms. The utility of our approach in speeding up inferential algorithms is demonstrated on simulated and real-world data.
Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments and limited performance of algorithms in many cases. In this paper, we propose a density-free off-policy algorithm, Generative Actor-Critic(GAC), using the push-forward model to increase the expressiveness of policies, which also includes an entropy-like technique, MMD-entropy regularizer, to balance the exploration and exploitation. Additionnally, we devise an adaptive mechanism to automatically scale this regularizer, which further improves the stability and robustness of GAC. The experiment results show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and asymptotic performance of algorithms obviously.
We study the mean-field Ising spin glass model with external field, where the random symmetric couplings matrix is orthogonally invariant in law. For sufficiently high temperature, we prove that the replica-symmetric prediction is correct for the first-order limit of the free energy. Our analysis is an adaption of a "conditional quenched equals annealed" argument used by Bolthausen to analyze the high-temperature regime of the Sherrington-Kirkpatrick model. We condition on a sigma-field that is generated by the iterates of an Approximate Message Passing algorithm for solving the TAP equations in this model, whose rigorous state evolution was recently established.
Development and diffusion of machine learning and big data tools provide a new tool for architects and urban planners that could be used as analytical or design instruments. The topic investigated in this paper is the application of Generative Adversarial Networks to the design of an urban block. The research presents a flexible model able to adapt to the morphological characteristics of a city. This method does not define explicitly any of the parameters of an urban block typical for a city, the algorithm learns them from the existing urban context. This approach has been applied to the cities with different morphology: Milan, Amsterdam, Tallinn, Turin, and Bengaluru in order to see the performance of the model and the possibility of style translation between different cities. The data are gathered from Openstreetmap and Open Data portals of the cities. This research presents the results of the experiments and their quantitative and qualitative evaluation.
Coordinating inverters at scale under uncertainty is the desideratum for integrating renewables in distribution grids. Unless load demands and solar generation are telemetered frequently, controlling inverters given approximate grid conditions or proxies thereof becomes a key specification. Although deep neural networks (DNNs) can learn optimal inverter schedules, guaranteeing feasibility is largely elusive. Rather than training DNNs to imitate already computed optimal power flow (OPF) solutions, this work integrates DNN-based inverter policies into the OPF. The proposed DNNs are trained through two OPF alternatives that confine voltage deviations on the average and as a convex restriction of chance constraints. The trained DNNs can be driven by partial, noisy, or proxy descriptors of the current grid conditions. This is important when OPF has to be solved for an unobservable feeder. DNN weights are trained via back-propagation and upon differentiating the AC power flow equations assuming the network model is known. Otherwise, a gradient-free variant is put forth. The latter is relevant when inverters are controlled by an aggregator having access only to a power flow solver or a digital twin of the feeder. Numerical tests compare the DNN-based inverter control schemes with the optimal inverter setpoints in terms of optimality and feasibility.
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.
This paper proposes a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.
Probabilistic topic models are popular unsupervised learning methods, including probabilistic latent semantic indexing (pLSI) and latent Dirichlet allocation (LDA). By now, their training is implemented on general purpose computers (GPCs), which are flexible in programming but energy-consuming. Towards low-energy implementations, this paper investigates their training on an emerging hardware technology called the neuromorphic multi-chip systems (NMSs). NMSs are very effective for a family of algorithms called spiking neural networks (SNNs). We present three SNNs to train topic models. The first SNN is a batch algorithm combining the conventional collapsed Gibbs sampling (CGS) algorithm and an inference SNN to train LDA. The other two SNNs are online algorithms targeting at both energy- and storage-limited environments. The two online algorithms are equivalent with training LDA by using maximum-a-posterior estimation and maximizing the semi-collapsed likelihood, respectively. They use novel, tailored ordinary differential equations for stochastic optimization. We simulate the new algorithms and show that they are comparable with the GPC algorithms, while being suitable for NMS implementation. We also propose an extension to train pLSI and a method to prune the network to obey the limited fan-in of some NMSs.