Electronic Health Records have become popular sources of data for secondary research, but their use is hampered by the amount of effort it takes to overcome the sparsity, irregularity, and noise that they contain. Modern learning architectures can remove the need for expert-driven feature engineering, but not the need for expert-driven preprocessing to abstract away the inherent messiness of clinical data. This preprocessing effort is often the dominant component of a typical clinical prediction project. In this work we propose using semantic embedding methods to directly couple the raw, messy clinical data to downstream learning architectures with truly minimal preprocessing. We examine this step from the perspective of capturing and encoding complex data dependencies in the data representation instead of in the model, which has the nice benefit of allowing downstream processing to be done with fast, lightweight, and simple models accessible to researchers without machine learning expertise. We demonstrate with three typical clinical prediction tasks that the highly compressed, embedded data representations capture a large amount of useful complexity, although in some cases the compression is not completely lossless.
暂无翻译
In the $\ell$-Component Order Connectivity problem ($\ell \in \mathbb{N}$), we are given a graph $G$ on $n$ vertices, $m$ edges and a non-negative integer $k$ and asks whether there exists a set of vertices $S\subseteq V(G)$ such that $|S|\leq k$ and the size of the largest connected component in $G-S$ is at most $\ell$. In this paper, we give a linear programming based kernel for $\ell$-Component Order Connectivity with at most $2\ell k$ vertices that takes $n^{\mathcal{O}(\ell)}$ time for every constant $\ell$. Thereafter, we provide a separation oracle for the LP of $\ell$-COC implying that the kernel only takes $(3e)^{\ell}\cdot n^{O(1)}$ time. On the way to obtaining our kernel, we prove a generalization of the $q$-Expansion Lemma to weighted graphs. This generalization may be of independent interest.
暂无翻译
Explainable artificial intelligence provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a comprehensive view may be difficult since they can present competing or contradictory evidence. To address this challenge we introduce the novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees -- called LIMEtree -- that offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. On top of strong fidelity guarantees, our implementation delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with respect to explainability desiderata, through quantitative experiments and via a pilot user study, on image and tabular data classification tasks, comparing it to LIME, which is a state-of-the-art surrogate explainer. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set of scenarios.
暂无翻译
Forecasting graph-based, time-dependent data has broad practical applications but presents challenges. Effective models must capture both spatial and temporal dependencies in the data, while also incorporating auxiliary information to enhance prediction accuracy. In this paper, we identify limitations in current state-of-the-art models regarding temporal dependency handling. To overcome this, we introduce GSA-Forecaster, a new deep learning model designed for forecasting in graph-based, time-dependent contexts. GSA-Forecaster utilizes graph sequence attention, a new attention mechanism proposed in this paper, to effectively manage temporal dependencies. GSA-Forecaster integrates the data's graph structure directly into its architecture, addressing spatial dependencies. Additionally, it incorporates auxiliary information to refine its predictions further. We validate its performance using real-world graph-based, time-dependent datasets, where it demonstrates superior effectiveness compared to existing state-of-the-art models.
暂无翻译
The Visual Domain Adaptation (VisDA) 2021 Challenge calls for unsupervised domain adaptation (UDA) methods that can deal with both input distribution shift and label set variance between the source and target domains. In this report, we introduce a universal domain adaptation (UniDA) method by aggregating several popular feature extraction and domain adaptation schemes. First, we utilize VOLO, a Transformer-based architecture with state-of-the-art performance in several visual tasks, as the backbone to extract effective feature representations. Second, we modify the open-set classifier of OVANet to recognize the unknown class with competitive accuracy and robustness. As shown in the leaderboard, our proposed UniDA method ranks the 3rd place with 48.49% ACC and 70.8% AUROC in the VisDA 2021 Challenge.
暂无翻译
In online markets, agents often learn from other's actions in addition to their private information. Such observational learning can lead to herding or information cascades in which agents eventually ignore their private information and "follow the crowd". Models for such cascades have been well studied for Bayes-rational agents that arrive sequentially and choose pay-off optimal actions. This paper additionally considers the presence of fake agents that take a fixed action in order to influence subsequent rational agents towards their preferred action. We characterize how the fraction of such fake agents impacts the behavior of rational agents given a fixed quality of private information. Our model results in a Markov chain with a countably infinite state space, for which we give an iterative method to compute an agent's chances of herding and its welfare (expected pay-off). Our main result shows a counter-intuitive phenomenon: there exist infinitely many scenarios where an increase in the fraction of fake agents in fact reduces the chances of their preferred outcome. Moreover, this increase causes a significant improvement in the welfare of every rational agent. Hence, this increase is not only counter-productive for the fake agents but is also beneficial to the rational agents.
暂无翻译
DP-coloring was introduced by Dvo\v{r}\'{a}k and Postle as a generalization of list coloring and signed coloring. A new coloring, strictly $f$-degenerate transversal, is a further generalization of DP-coloring and $L$-forested-coloring. In this paper, we present some structural results on planar and toroidal graphs with forbidden configurations, and establish some sufficient conditions for the existence of strictly $f$-degenerate transversal based on these structural results. Consequently, (i) every toroidal graph without subgraphs isomorphic to the configurations in Fig.2 is DP-$4$-colorable, and has list vertex arboricity at most $2$, (ii) every toroidal graph without $4$-cycles is DP-$4$-colorable, and has list vertex arboricity at most $2$, (iii) every planar graph without subgraphs isomorphic to the configurations in Fig.3 is DP-$4$-colorable, and has list vertex arboricity at most $2$. These results improve upon previous results on DP-$4$-coloring [Discrete Math. 341~(7) (2018) 1983--1986; Bull. Malays. Math. Sci. Soc. 43~(3) (2020) 2271--2285] and (list) vertex arboricity [Discrete Math. 333 (2014) 101--105; Int. J. Math. Stat. 16~(1) (2015) 97--105; Iranian Math. Soc. 42~(5) (2016) 1293--1303].
暂无翻译
The upsurge in pre-trained large models started by ChatGPT has swept across the entire deep learning community. Such powerful models demonstrate advanced generative ability and multimodal understanding capability, which quickly set new state of the arts on a variety of benchmarks. The pre-trained LLM usually plays the role as a universal AI model that can conduct various tasks like article analysis and image comprehension. However, due to the prohibitively high memory and computational cost of implementing such a large model, the conventional models (such as CNN and ViT) are still essential for many visual perception tasks. In this paper, we propose to enhance the representation ability of ordinary vision models on perception tasks (e.g. image classification) by taking advantage of the off-the-shelf large pre-trained models. We present a new learning framework, dubbed GPT4Image, where the knowledge of the large pre-trained models are extracted to help CNNs and ViTs learn better representations and achieve higher performance. Firstly, we curate a high quality description set by prompting a multimodal LLM to generate descriptions for training images. Then, these detailed descriptions are fed into a pre-trained encoder to extract text embeddings that encodes the rich semantics of images. During training, text embeddings will serve as extra supervising signal and be aligned with image representations learned by vision models. The alignment process helps vision models achieve better performance with the aid of pre-trained LLMs. We conduct extensive experiments to verify the effectiveness of the proposed algorithm on various visual perception tasks for heterogeneous model architectures.
暂无翻译
The Hegselmann-Krause (HK) model of opinion dynamics describes how opinions held by individuals in a community change over time in response to the opinions of others and their access to the true value, T, to which these opinions relate. Here, I extend the simple HK model to incorporate an Artificially Intelligent (AI) Oracle that averages the opinions of members of the community. Agent-based simulations show that (1) if individuals only have access to the Oracle (and not T), and incorporate the Oracle's opinion as they update their opinions, then all opinions will converge on a common value; (2) in contrast, if all individuals also have access to T, then all opinions will ultimately converge to T, but the presence of an Oracle may delay the time to convergence; (3) if only some individuals have access to T, opinions may not converge to T, but under certain conditions, universal access to the Oracle will guarantee convergence to T; and (4) whether or not the Oracle only accesses the opinions of individuals who have access to T, or whether it accesses the opinions of everyone in the community, makes no marked difference to the extent to which the average opinion differs from T.
暂无翻译
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications. However, their increased computational and memory demands present significant challenges, especially when handling long sequences. This paper focuses on the long-context scenario, addressing the inefficiencies in KV cache memory consumption during inference. Unlike existing approaches that optimize the memory based on the sequence length, we identify substantial redundancy in the channel dimension of the KV cache, as indicated by an uneven magnitude distribution and a low-rank structure in the attention weights. In response, we propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels. Our approach not only maintains or enhances model accuracy but also achieves a reduction in KV cache memory costs by over 20% compared with vanilla KV cache eviction and quantization methods. For instance, ThinK integrated with KIVI can achieve a 2.8x reduction in peak memory usage while maintaining nearly the same quality, enabling up to a 5x increase in batch size when using a single GPU. Extensive evaluations on the LLaMA and Mistral models across various long-sequence datasets verified the efficiency of ThinK, establishing a new baseline algorithm for efficient LLM deployment without compromising performance. Our code has been made available at https://github.com/SalesforceAIResearch/ThinK.
暂无翻译