Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.

0
下载
预览

SNS providers are known to carry out the recompression and resizing of uploaded videos/images, but most conventional methods for detecting tampered videos/images are not robust enough against such operations. In addition, videos are temporally operated such as the insertion of new frames and the permutation of frames, of which operations are difficult to be detected by using conventional methods. Accordingly, in this paper, we propose a novel method with a robust hashing algorithm for detecting temporally operated videos even when applying resizing and compression to the videos.

0
下载
预览

Network Slicing (NS) is crucial for efficiently enabling divergent network applications in next generation networks. Nonetheless, the complex Quality of Service (QoS) requirements and diverse heterogeneity in network services entails high computational time for Network Slice Provisioning (NSP) optimization. The legacy optimization methods are challenging to meet the low latency and high reliability of network applications. To this end, we model the real-time NSP as an Online Network Slice Provisioning (ONSP) problem. Specifically, we formulate the ONSP problem as an online Multi-Objective Integer Programming Optimization (MOIPO) problem. Then, we approximate the solution of the MOIPO problem by applying the Proximal Policy Optimization (PPO) method to the traffic demand prediction. Our simulation results show the effectiveness of the proposed method compared to the state-of-the-art MOIPO solvers with a lower SLA violation rate and network operation cost.

0
下载
预览

Image captioning models are usually trained according to human annotated ground-truth captions, which could generate accurate but generic captions. In this paper, we focus on generating the distinctive captions that can distinguish the target image from other similar images. To evaluate the distinctiveness of captions, we introduce a series of metrics that use large-scale vision-language pre-training model CLIP to quantify the distinctiveness. To further improve the distinctiveness of captioning models, we propose a simple and effective training strategy which trains the model by comparing target image with similar image group and optimizing the group embedding gap. Extensive experiments are conducted on various baseline models to demonstrate the wide applicability of our strategy and the consistency of metric results with human evaluation. By comparing the performance of our best model with existing state-of-the-art models, we claim that our model achieves new state-of-the-art towards distinctiveness objective.

0
下载
预览

Instance segmentation on 3D point clouds has been attracting increasing attention due to its wide applications, especially in scene understanding areas. However, most existing methods require training data to be fully annotated. Manually preparing ground-truth labels at point-level is very cumbersome and labor-intensive. To address this issue, we propose a novel weakly supervised method RWSeg that only requires labeling one object with one point. With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information respectively to unknown regions, using self-attention and random walk. Furthermore, we propose a Cross-graph Competing Random Walks (CGCRW) algorithm which encourages competition among different instance graphs to resolve ambiguities in closely placed objects and improve the performance on instance assignment. RWSeg can generate qualitative instance-level pseudo labels. Experimental results on ScanNet-v2 and S3DIS datasets show that our approach achieves comparable performance with fully-supervised methods and outperforms previous weakly-supervised methods by large margins. This is the first work that bridges the gap between weak and full supervision in the area.

0
下载
预览

With deep learning (DL) outperforming conventional methods for different tasks, much effort has been devoted to utilizing DL in various domains. Researchers and developers in the traffic domain have also designed and improved DL models for forecasting tasks such as estimation of traffic speed and time of arrival. However, there exist many challenges in analyzing DL models due to the black-box property of DL models and complexity of traffic data (i.e., spatio-temporal dependencies). Collaborating with domain experts, we design a visual analytics system, AttnAnalyzer, that enables users to explore how DL models make predictions by allowing effective spatio-temporal dependency analysis. The system incorporates dynamic time warping (DTW) and Granger causality tests for computational spatio-temporal dependency analysis while providing map, table, line chart, and pixel views to assist user to perform dependency and model behavior analysis. For the evaluation, we present three case studies showing how AttnAnalyzer can effectively explore model behaviors and improve model performance in two different road networks. We also provide domain expert feedback.

0
下载
预览

As the scale of problems and data used for experimental design, signal processing and data assimilation grows, the oft-occuring least squares subproblems are correspondingly growing in size. As the scale of these least squares problems creates prohibitive memory movement costs for the usual incremental QR and Krylov-based algorithms, randomized least squares problems are garnering more attention. However, these randomized least squares solvers are difficult to integrate application algorithms as their uncertainty limits practical tracking of algorithmic progress and reliable stopping. Accordingly, in this work, we develop theoretically-rigorous, practical tools for quantifying the uncertainty of an important class of iterative randomized least squares algorithms, which we then use to track algorithmic progress and create a stopping condition. We demonstrate the effectiveness of our algorithm by solving a 0.78 TB least squares subproblem from the inner loop of incremental 4D-Var using only 195 MB of memory.

0
下载
预览

Our aim is to develop a better understanding of how the Point of Release (PoR) of a ball affects the perception of animated throwing motions. We present the results of a perceptual study where participants viewed animations of a virtual human throwing a ball, in which the point of release was modified to be early or late. We found that errors in overarm throws with a late PoR are detected more easily than an early PoR, while the opposite is true for underarm throws. The viewpoint and the distance the ball travels also have an effect on perceived realism. The results of this research can help improve the plausibility of throwing animations in interactive applications such as games or VR.

0
下载
预览

Personalised interactive systems such as recommender systems require selecting relevant items dependent on context. Production systems need to identify the items rapidly from very large catalogues which can be efficiently solved using maximum inner product search technology. Offline optimisation of maximum inner product search can be achieved by a relaxation of the discrete problem resulting in policy learning or reinforce style learning algorithms. Unfortunately this relaxation step requires computing a sum over the entire catalogue making the complexity of the evaluation of the gradient (and hence each stochastic gradient descent iterations) linear in the catalogue size. This calculation is untenable in many real world examples such as large catalogue recommender systems severely limiting the usefulness of this method in practice. In this paper we show how it is possible to produce an excellent approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas: a new Monte Carlo estimate of the gradient of a policy, the self normalised importance sampling estimator and the use of fast maximum inner product search at training time. Extensive experiments show our algorithm is an order of magnitude faster than naive approaches yet produces equally good policies.

0
下载
预览

Embedding ethics modules within computer science courses has become a popular response to the growing recognition that CS programs need to better equip their students to navigate the ethical dimensions of computing technologies like AI, machine learning, and big data analytics. However, the popularity of this approach has outpaced the evidence of its positive outcomes. To help close that gap, this empirical study reports positive results from Northeastern's program that embeds values analysis modules into CS courses. The resulting data suggest that such modules have a positive effect on students' moral attitudes and that students leave the modules believing they are more prepared to navigate the ethical dimensions they'll likely face in their eventual careers. Importantly, these gains were accomplished at an institution without a philosophy doctoral program, suggesting this strategy can be effectively employed by a wider range of institutions than many have thought.

0
下载
预览
登陆后查看更多精品内容
VIP会员
本周荟萃主题
区块链
区块链(Blockchain)是由节点参与的分布式数据库系统,它的特点是不可更改,不可伪造,也可以将其理解为账簿系统(ledger)。它是比特币的一个重要概念,完整比特币区块链的副本,记录了其代币(token)的每一笔交易。通过这些信息,我们可以找到每一个地址,在历史上任何一点所拥有的价值。
深度学习
机器学习的一个分支,它基于试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的一系列算法。
机器学习
“机器学习是近20多年兴起的一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让 可以自动“ 学习”的算法。机器学习算法是一类从数据中自动分析获得规律,并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论,机器学习与统计推断学联系尤为密切,也被称为统计学习理论。算法设计方面,机器学习理论关注可以实现的,行之有效的学习算法。很多 推论问题属于 无程序可循难度,所以部分的机器学习研究是开发容易处理的近似算法。”

——中文维基百科
强化学习
强化学习(RL)是机器学习的一个领域,与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外,强化学习是三种基本的机器学习范式之一。 强化学习与监督学习的不同之处在于,不需要呈现带标签的输入/输出对,也不需要显式纠正次优动作。相反,重点是在探索(未知领域)和利用(当前知识)之间找到平衡。 该环境通常以马尔可夫决策过程(MDP)的形式陈述,因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于,后者不假设MDP的确切数学模型,并且针对无法采用精确方法的大型MDP。
推荐系统
推荐系统,是指根据用户的习惯、偏好或兴趣,从不断到来的大规模信息中识别满足用户兴趣的信息的过程。推荐推荐任务中的信息往往称为物品(Item)。根据具体应用背景的不同,这些物品可以是新闻、电影、音乐、广告、商品等各种对象。推荐系统利用电子商务网站向客户提供商品信息和建议,帮助用户决定应该购买什么产品,模拟销售人员帮助客户完成购买过程。个性化推荐是根据用户的兴趣特点和购买行为,向用户推荐用户感兴趣的信息和商品。随着电子商务规模的不断扩大,商品个数和种类快速增长,顾客需要花费大量的时间才能找到自己想买的商品。这种浏览大量无关的信息和产品过程无疑会使淹没在信息过载问题中的消费者不断流失。为了解决这些问题,个性化推荐系统应运而生。个性化推荐系统是建立在海量数据挖掘基础上的一种高级商务智能平台,以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。
卷积神经网络
在深度学习中,卷积神经网络(CNN或ConvNet)是一类深度神经网络,最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征,它们也被称为位移不变或空间不变的人工神经网络(SIANN)。它们在图像和视频识别,推荐系统,图像分类,医学图像分析,自然语言处理,和财务时间序列中都有应用。
计算机网络
计算机网络( Computer Networks )指将地理位置不同的多台计算机及其外部设备,通过通信线路连接起来,在网络操作系统及网络通信协议的管理和协调下,实现资源共享和信息传递的计算机系统。
命名实体识别
命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。
机器翻译
机器翻译,又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的终极目标之一,具有重要的科学研究价值。
计算机视觉
计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。
微信扫码咨询专知VIP会员