Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus on passively answering queries from users, rather than actively collecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, users need either to heavily customize an existing passive Big Data system or to glue multiple systems together. Either choice would require significant effort from users and incur additional overhead. In this paper, we present the BAD (Big Active Data) system, which is designed to preserve the merits of passive Big Data systems and introduce new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system's performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a "glued" system.
翻译:今天,数据是通过各种装置、服务和应用而积极生成的。这些数据不仅对于它所包含的信息很重要,而且对于它与其他数据和感兴趣的用户的关系也很重要。大多数现有的大数据系统侧重于被动回答用户的询问,而不是积极收集数据、处理数据并向用户提供服务。为了满足被动和主动的要求,用户需要大规模定制现有的被动大数据系统或将多个系统粘合在一起。选择中的任何一种都需要用户作出重大努力,并产生额外的间接费用。本文介绍BAD(大数据)系统,该系统旨在保存被动的大数据系统的优点,并为大规模用户积极服务大数据提供新的功能。我们展示BAD系统的设计和执行,展示BAD如何便利提供被动和主动数据服务,对BAD系统的规模性工作进行调查,并展示通过“胶状”系统向像BAD一样的服务带来的复杂情况。