Introduction
Building Real-Time Streaming data Pipeline for Data Ingestion from different sources using Apache Nifi, Apache Kafka, Apache Spark and Cassandra.
Apache Nifi provides Web UI Dashboard and Helps to automate the workflow.
Business Challenge
-
Benchmarking of Data Pipeline using Nifi and Kafka with message Size and Duration
-
Real-Time streaming, Memory Management, Scalable and concurrency.
-
Interactive Dashboard with Real-Time Data analytics and visualization in D3.js Charts and React.js.
-
End-to-end delivery guarantee and Error handling of data from Twitter Agent to Processing engine.
-
Test Data will be Apache Hadoop Cluster Logs and Twitter Stream API’s
Solution Offered For Real-Time Streaming Data Pipeline
Real Time Streaming Platform with Apache Nifi as Collector as well as Producer for data ingestion and Apache Nifi as Collector and Apache Kafka as a Producer with Apache Spark Streaming and Apache Spark Structured Streaming
Apache Cassandra Deployed as Microservices architecture on Kubernetes as well as on EC2 Instances as a Cluster for scaling, guaranteed delivery of data across the Data Pipeline
Real-time Streaming Architecture for Data Pipeline Components -
-
Automate Data Workflow - Apache Nifi
-
Messaging System - Apache Kafka
-
Stream Processing Engine - Apache Spark Streaming
-
Rest API & Twitter Dashboard for Real-time Tweets