项目名称: 智能环境下基于音视频特征融合的多说话人跟踪研究
项目编号: No.61263031
项目类型: 地区科学基金项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 曹洁
作者单位: 兰州理工大学
项目金额: 44万元
中文摘要: 本项目立足智能环境下音、视频信息之间的相关性与互补性,研究非合作场景下音视频融合的多说话人跟踪问题。通过深入分析说话人数目以及音、视频重叠对系统模型的影响,研究复杂环境下多说话人跟踪系统的建模问题,探索合作与非合作场景下的模型交互方法,并在多说话人运动模型的基础上,研究高维度非线性系统的滤波方法;通过分析麦克风阵列与摄像机之间的校准方法,研究传感器坐标系与笛卡尔坐标系之间的映射关系,建立说话人三维位置空间与视频二维图像之间的校准机制,并以信息熵理论为基础,探索鲁棒、高效的音视频信息融合方法;深入研究系统初始化时间和初始化精度与跟踪精度之间的关系,为建立系统整体性能评价指标提供新的思路。本项目是人机交互领域的前沿性研究课题,在视频会议系统、多媒体系统、机器人等领域有着广泛的应用,其研究成果将进一步提升我国远程会议系统以及自动会议分析系统的应用水平,具有重要的应用前景和社会价值。
中文关键词: 智能环境;说话人跟踪;音视频融合;证据理论;粒子滤波
英文摘要: This project is based on audio and video information complementarity and correlation in intelligent environment which studies the multiple speakers tracking problem of audio and video feature fusion in the non-cooperative scene. First, we explore the interactive method of model with the cooperative and non-cooperative scene, analyse the modeling problem of multiple speakers tracking system in the complex environment, and explore the nonlinear system filtering method of high dimension through establishing a reasonable speakers motion model. Second, we research mapping relation of the sensor coordinate system and cartesian coordinate by analysing the calibration method between the microphone array and cameras, set up the calibration mechanism between 3D space position and video 2D image of speakers, and explore robust and efficient information fusion method of audio and video with the information entropy theory as a foundation. Third, we try to provide new ideas for establishing overall performance evaluation system by researching the relationship between system initialization time and initialization tracking precision.This project is a new and crucial in the field of human-computer interaction, which has a wide applications in the video conference system, multimedia system and robot field. The results of the stud
英文关键词: intelligent environment;speaker tracking;audio video fusion;evidence theory;particle filter