项目名称: 基于视觉语义推理与上下文约束建模的场景理解方法研究
项目编号: No.61272218
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 路通
作者单位: 南京大学
项目金额: 80万元
中文摘要: 基于视觉的自然场景理解是未来若干年内的研究热点和重要挑战之一,其目标是对自然场景图像及视频的内容作出有效分析、认知与表达,目前相关理论和算法正处于初期探索阶段。我们的预实验研究成果表明,从场景视觉语义推理、场景目标识别和场景行为模式检测三个环节展开研究,有助于构建自然场景理解的创新机制。本项目采用基于最大间隔训练的主题模型来标注场景中可见区域的语义,进而利用标注与视觉特征间的关系,通过对含隐变量的约束优化求解来推导间接场景语义。针对自然场景图像中目标的多样性和可变性问题,通过条件随机场构建自然场景的多层次上下文,并利用无向图刻画目标的多视角关联表示,提高目标识别算法的准确性和鲁棒性。最后,通过三维高斯分布来描述场景视频中融合深度信息的局部运动模式,并采用马尔科夫随机场模型刻画局部运动模式间时空上下文,以探索拥挤场景中行为模式检测的新方法。本项目的研究将为自然场景理解探索提供新的思路和技术。
中文关键词: 场景理解;视觉语义推理;上下文建模;;
英文摘要: Vision-based scene understanding is one of the hot spots and challenges in the next decade, with the targets of automaticlaly and effectively analyzing, recognizing and further representing the contents directly from scene images or videos. However, few research has been reported recently and its theories or algorithms are still in the early exploration stage. Our experiments show that the following three approaches of visual semantics reseaning, scene object recognition and behavior detection can provide a novel framework for natural scene understanding.We first label visual regions in a scene image through the Max-Margin based topic model, and then infer indrect semantics by optimizing the constrains of latent aspects, aiming at utilizing the relations among lables and visual features. Next, considering the diversity and variability properties of scene objects, we model multi-level scene contexts using the Conditional Random Field (CRF) techniques and describe multi-view object constraints through a novel undirected graph representation. As a result, scene objects can be recognized in a more accurate and robust way. Finally, we use three-dimentional Gaussian distributions to describe local motion patterns inside depth-integrated scene videos, and model spatio-temporal contexts through the Markov Random Field (
英文关键词: scene understanding;visual-semantics reasoning;context modeling;;