Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence (versus absence) of features within the stimulus. Under this interpretation, we can derive the probability $P(x_0 \land x_1)$ that a pair of independent features are both present in the stimulus from their logits. By converting the resulting probability back into a logit, we obtain a logit-space equivalent of the AND operation. However, since this function involves taking multiple exponents and logarithms, it is not well suited to be directly used within neural networks. We thus constructed an efficient approximation named $\text{AND}_\text{AIL}$ (the AND operator Approximate for Independent Logits) utilizing only comparison and addition operations, which can be deployed as an activation function in neural networks. Like MaxOut, $\text{AND}_\text{AIL}$ is a generalization of ReLU to two-dimensions. Additionally, we constructed efficient approximations of the logit-space equivalents to the OR and XNOR operators. We deployed these new activation functions, both in isolation and in conjunction, and demonstrated their effectiveness on a variety of tasks including image classification, transfer learning, abstract reasoning, and compositional zero-shot learning.

0
下载
关闭预览

相关内容

神经网络(Neural Networks)是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛,以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交,有助于全面的神经网络研究,从行为和大脑建模,学习算法,通过数学和计算分析,系统的工程和技术应用,大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流,并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此,神经网络编委会代表的专家领域包括心理学,神经生物学,计算机科学,工程,数学,物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学,神经科学,学习系统,数学和计算分析、工程和应用。 官网地址:http://dblp.uni-trier.de/db/journals/nn/

Response functions linking regression predictors to properties of the response distribution are fundamental components in many statistical models. However, the choice of these functions is typically based on the domain of the modeled quantities and is not further scrutinized. For example, the exponential response function is usually assumed for parameters restricted to be positive although it implies a multiplicative model which may not necessarily be desired. Consequently, applied researchers might easily face misleading results when relying on defaults without further investigation. As an alternative to the exponential response function, we propose the use of the softplus function to construct alternative link functions for parameters restricted to be positive. As a major advantage, we can construct differentiable link functions corresponding closely to the identity function for positive values of the regression predictor, which implies an quasi-additive model and thus allows for an additive interpretation of the estimated effects by practitioners. We demonstrate the applicability of the softplus response function using both simulations and real data. In four applications featuring count data regression and Bayesian distributional regression, we contrast our approach to the commonly used exponential response function.

0
0
下载
预览

Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.

0
0
下载
预览

In this study, a novel, general and ingenious activation function termed MDAC is proposed to surmount the troubles of gradient vanishing and non-differentiable existence. MDAC approximately inherits the properties of exponential activation function (such as Tanh family) and piecewise linear activation function (such as ReLU family). Specifically, in the positive region, the adaptive linear structure is designed to respond to various domain distributions. In the negative region, the combination of exponent and linearity is considered to conquer the obstacle of gradient vanishing. Furthermore, the non-differentiable existence is eliminated by smooth approximation. Experiments show that MDAC improves performance on both classical models and pre-training optimization models in six domain datasets by simply changing the activation function, which indicates MDAC's effectiveness and pro-gressiveness. MDAC is superior to other prevalent activation functions in robustness and generalization, and can reflect excellent activation performance in multiple domains.

0
0
下载
预览

Optimizing highly complex cost/energy functions over discrete variables is at the heart of many open problems across different scientific disciplines and industries. A major obstacle is the emergence of many-body effects among certain subsets of variables in hard instances leading to critical slowing down or collective freezing for known stochastic local search strategies. An exponential computational effort is generally required to unfreeze such variables and explore other unseen regions of the configuration space. Here, we introduce a quantum-inspired family of nonlocal Nonequilibrium Monte Carlo (NMC) algorithms by developing an adaptive gradient-free strategy that can efficiently learn key instance-wise geometrical features of the cost function. That information is employed on-the-fly to construct spatially inhomogeneous thermal fluctuations for collectively unfreezing variables at various length scales, circumventing costly exploration versus exploitation trade-offs. We apply our algorithm to two of the most challenging combinatorial optimization problems: random k-satisfiability (k-SAT) near the computational phase transitions and Quadratic Assignment Problems (QAP). We observe significant speedup and robustness over both specialized deterministic solvers and generic stochastic solvers. In particular, for 90% of random 4-SAT instances we find solutions that are inaccessible for the best specialized deterministic algorithm known as Survey Propagation (SP) with an order of magnitude improvement in the quality of solutions for the hardest 10% instances. We also demonstrate two orders of magnitude improvement in time-to-solution over the state-of-the-art generic stochastic solver known as Adaptive Parallel Tempering (APT).

0
0
下载
预览

We prove necessary density conditions for sampling in spectral subspaces of a second order uniformly elliptic differential operator on $R^d$ with slowly oscillating symbol. For constant coefficient operators, these are precisely Landaus necessary density conditions for bandlimited functions, but for more general elliptic differential operators it has been unknown whether such a critical density even exists. Our results prove the existence of a suitable critical sampling density and compute it in terms of the geometry defined by the elliptic operator. In dimension 1, functions in a spectral subspace can be interpreted as functions with variable bandwidth, and we obtain a new critical density for variable bandwidth. The methods are a combination of the spectral theory and the regularity theory of elliptic partial differential operators, some elements of limit operators, certain compactifications of $R^d $, and the theory of reproducing kernel Hilbert spaces.

0
0
下载
预览

Analogy-making is at the core of human and artificial intelligence and creativity with applications to such diverse tasks as commonsense reasoning, learning, language acquisition, and story telling. This paper introduces from first principles an abstract algebraic framework of analogical proportions of the form `$a$ is to $b$ what $c$ is to $d$' in the general setting of universal algebra. This enables us to compare mathematical objects possibly across different domains in a uniform way which is crucial for AI-systems. It turns out that our notion of analogical proportions has appealing mathematical properties. As we construct our model from first principles using only elementary concepts of universal algebra, and since our model questions some basic properties of analogical proportions presupposed in the literature, to convince the reader of the plausibility of our model we show that it can be naturally embedded into first-order logic via model-theoretic types and prove from that perspective that analogical proportions are compatible with structure-preserving mappings. This provides conceptual evidence for its applicability. In a broader sense, this paper is a first step towards a theory of analogical reasoning and learning systems with potential applications to fundamental AI-problems like commonsense reasoning and computational learning and creativity.

0
0
下载
预览

Feature attribution is often loosely presented as the process of selecting a subset of relevant features as a rationale of a prediction. This lack of clarity stems from the fact that we usually do not have access to any notion of ground-truth attribution and from a more general debate on what good interpretations are. In this paper we propose to formalise feature selection/attribution based on the concept of relaxed functional dependence. In particular, we extend our notions to the instance-wise setting and derive necessary properties for candidate selection solutions, while leaving room for task-dependence. By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods and show that, even when optimised, some fail to verify the proposed properties and provide wrong solutions.

0
4
下载
预览

Knowledge graph reasoning, which aims at predicting the missing facts through reasoning with the observed facts, is critical to many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle their uncertainty. However, the inference of MLNs is usually very difficult due to the complicated graph structures. Different from MLNs, knowledge graph embedding methods (e.g. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient. However, they are unable to leverage domain knowledge. In this paper, we propose the probabilistic Logic Neural Network (pLogicNet), which combines the advantages of both methods. A pLogicNet defines the joint distribution of all possible triplets by using a Markov logic network with first-order logic, which can be efficiently optimized with the variational EM algorithm. In the E-step, a knowledge graph embedding model is used for inferring the missing triplets, while in the M-step, the weights of logic rules are updated based on both the observed and predicted triplets. Experiments on multiple knowledge graphs prove the effectiveness of pLogicNet over many competitive baselines.

0
6
下载
预览

Large scale knowledge graph embedding has attracted much attention from both academia and industry in the field of Artificial Intelligence. However, most existing methods concentrate solely on fact triples contained in the given knowledge graph. Inspired by the fact that logic rules can provide a flexible and declarative language for expressing rich background knowledge, it is natural to integrate logic rules into knowledge graph embedding, to transfer human knowledge to entity and relation embedding, and strengthen the learning process. In this paper, we propose a novel logic rule-enhanced method which can be easily integrated with any translation based knowledge graph embedding model, such as TransE . We first introduce a method to automatically mine the logic rules and corresponding confidences from the triples. And then, to put both triples and mined logic rules within the same semantic space, all triples in the knowledge graph are represented as first-order logic. Finally, we define several operations on the first-order logic and minimize a global loss over both of the mined logic rules and the transformed first-order logics. We conduct extensive experiments for link prediction and triple classification on three datasets: WN18, FB166, and FB15K. Experiments show that the rule-enhanced method can significantly improve the performance of several baselines. The highlight of our model is that the filtered Hits@1, which is a pivotal evaluation in the knowledge inference task, has a significant improvement (up to 700% improvement).

0
6
下载
预览

Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. For instance, given an incomplete biological knowledge graph, we might want to predict "em what drugs are likely to target proteins involved with both diseases X and Y?" -- a query that requires reasoning about all possible proteins that {\em might} interact with diseases X and Y. Here we introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. By performing logical operations within a low-dimensional embedding space, our approach achieves a time complexity that is linear in the number of query variables, compared to the exponential complexity required by a naive enumeration-based approach. We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.

0
5
下载
预览
小贴士
相关论文
Paul F. V. Wiemann,Thomas Kneib,Julien Hambuckers
0+阅读 · 11月28日
Vincent Plassier,François Portier,Johan Segers
0+阅读 · 11月27日
Zhenhua Wang,Dong Gao,Haozhe Liu,Fanglin Liu
0+阅读 · 11月27日
Nonequilibrium Monte Carlo for unfreezing variables in hard combinatorial optimization
Masoud Mohseni,Daniel Eppens,Johan Strumpfer,Raffaele Marino,Vasil Denchev,Alan K. Ho,Sergei V. Isakov,Sergio Boixo,Federico Ricci-Tersenghi,Hartmut Neven
0+阅读 · 11月26日
Christian Antić
0+阅读 · 11月24日
Darius Afchar,Romain Hennequin,Vincent Guigue
4+阅读 · 4月26日
Meng Qu,Jian Tang
6+阅读 · 2019年6月20日
Logic Rules Powered Knowledge Graph Embedding
Pengwei Wang,Dejing Dou,Fangzhao Wu,Nisansa de Silva,Lianwen Jin
6+阅读 · 2019年3月9日
Embedding Logical Queries on Knowledge Graphs
William L. Hamilton,Payal Bajaj,Marinka Zitnik,Dan Jurafsky,Jure Leskovec
5+阅读 · 2018年9月6日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
12+阅读 · 2019年5月22日
已删除
将门创投
8+阅读 · 2019年3月18日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
【学习】(Python)SVM数据分类
机器学习研究会
5+阅读 · 2017年10月15日
【学习】Hierarchical Softmax
机器学习研究会
3+阅读 · 2017年8月6日
Top