项目名称: 大规模概率数据的管理与查询优化
项目编号: No.61202009
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 李建
作者单位: 清华大学
项目金额: 25万元
中文摘要: 几乎所有的决策问题都不可避免的包含了一定程度的非确定因素,如数据测量中产生的噪音,参数估计的误差等等。一般来讲,处理非确定性数据的一个系统的方法是将这些数据视为随机变量,然后以概率论为原则去进行数据处理和优化。随着生成的非确定数据的规模日益增加,处理和查询这些数据的难度也越来越大。因此我们需要新型的处理随机数据的数据库系统和新的查询优化算法。今年来,关于概率数据库和处理随机输入数据的优化算法是国际上研究的热点和难点,存在很多挑战。我们计划在本项目中对随机数据的管理和查询优化算法进行深入系统的研究。具体来讲,我们计划深入探索如下问题:(1)关于非确定数据上的SQL查询,排序,区间查询等问题的更有效的算法;(2)非确定数据的流算法;(3)在非确定输入下的各种优化问题;(4)非确定数据处理算法的应用,特别是在如传感器网络数据监控、群众外包等新兴领域中的应用。
中文关键词: 随机优化;非确定数据;概率模型;近似算法;组合优化
英文摘要: Uncertainties are inevitably involved in almost all important decision problems. Examples include noise generated in data measurement, errors produced in parameter estimation, and so on. Generally speaking, one systematic way to deal with uncertain data is to view the data as random variables and to manage and query the data according to probability theory. As the volume of probabilistic data generated increases drastically, handling such data becomes a highly difficult problem. Therefore, we need new database systems and query optimization algorithms to answer the new challenge. Due to its significance and difficulty, building probabilistic databases and developing scalable and efficient query optimization algorithms recently have attracted a lot of attentions from database and algorithm researchers. In this project, we aim to systematically investigate the problems of managing and querying large-scale probabilistic data. In particular, we plan to study the following concrete problems: (1) Developing more efficient algorithms for processing SQL, ranking and range queries for uncertain data; (2) Developing streaming algorithms for uncertain data sets; (3) Studing various optimization problems under uncertain input; (4)Identifing new applications for probabilistic databases, especially in new application domains
英文关键词: Stochastic Optimization;Uncertain data;Probabilistic model;Approximation Algorithms;Combinatorial Optimization