项目名称: 维吾尔文印刷文档图像中不良信息过滤关键技术研究
项目编号: No.61461049
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 无线电电子学、电信技术
项目作者: 地里木拉提·吐尔逊
作者单位: 新疆大学
项目金额: 40万元
中文摘要: 面对迅速增长的WEB数据,如何搜集并从中发现信息,对信息进行分析和理解、提取、组织和处理、如何针对特定的需求获取最新的信息、如何区分有用的信息和不良有害信息、如何管理网络信息的传播逐渐成为当前信息科学与技术领域面临的一大挑战。尤其是境、内外三股势力将网络信息的传播方式作为内外勾结的主要渠道的今天,必须采用现代科技手段,加以监控不良信息的传播、定位和取证,这对我区民族团结、社会稳定、以及国家的长治久安具有重要的现实意义。本课题在充分调研中、英文不良信息监控技术的新理论、新方法和新技术的基础上,从维吾尔语语言文字特点出发,紧密结合实际应用需求,以理论研究和实证研究相结合的方法开展网页、微信、微博等诸多网络应用中出现的印刷体维吾尔文文档图像的获取,版面结构分析,复杂背景下的文档图像中文本区域定位与提取技术,以及在被分割的文本区域中搜索与匹配关键词语等关键技术研究。
中文关键词: 文档图像;不良信息过滤;信息内容安全;光学字符识别;印刷体维吾尔文
英文摘要: Facing with the rapid growth of WEB data, how to collect and extract the useful information, how to do the information analysis and understanding, extraction, organization and processing, how to access to the latest information according to specific requirements, how to distinguish the useful information and the sensitive harmful information, how to manage the information transmission procedure going through the internet has gradually become the major challenges in current information science and technology. Especially the inside or outside three forces take the network as the main information delivering path to extrange ideas, so we must use modern means of science and technology to positioning, forensics, and monitoring the spread of harmful information, and the results of those have the important practical significance to national unity, social stability. This project will take the full investigation of English and Chinese sensitive information monitoring technology, and on the basis of deep learning the new theory, new method and new technology, then combining the Uyghur language characteristics with actual application requirements, will conduct the research of new theory and new technology for collecting Uighur document images from network and preprocessing (document images extraction from webpages, document image structure analysis), key technologies for Uyghur text area localization and extraction from document images ,and variable template matching based keyword detection algorithms ect.
英文关键词: Document Image;Keyword Spotting;Information Content Security;OCR;Printed Uyghur script