Digitization and data-driven manufacturing process is needed for today's industry. The term Industry 4.0 stands for today industrial digitization which is defined as a new level of organization and control over the entire value chain of the life cycle of products; it is geared towards increasingly individualized customer's high-quality expectations. However, due to the increase in the number of connected devices and the variety of data, it has become difficult to store and analyze data with conventional systems. The motivation of this paper is to provide an overview of the understanding of the big data pipeline, providing a real-time on-premise data acquisition, data compression, data storage and processing with Apache Kafka and Apache Spark implementation on Apache Ha-doop cluster, and identifying the challenges and issues occurring with implementation the Farplas manufacturing company, which is one of the biggest Tier 1 automotive supplier in Turkey, to study the new trends and streams related to topics via Industry 4.0.
翻译:当今工业需要数字化和数据驱动的制造过程。工业4.0一词是指当今工业数字化,它被定义为对产品生命周期整个价值链的一个新的组织和控制水平;它面向日益个性化的客户的高质量期望;然而,由于连接装置数量增加和数据种类繁多,因此难以与传统系统储存和分析数据。本文的动机是概述对大数据管道的了解,通过阿帕奇卡夫卡和阿帕奇Spark在阿帕奇Hadoop集群上实时实时实时进行数据采集、数据压缩、数据储存和处理,并查明实施Farplas制造公司过程中出现的挑战和问题,该公司是土耳其最大的一级汽车供应商之一,负责研究通过工业4.0进行的专题的新趋势和流。