Overview

 

StreamSet Provides a Platform for Building, Executing and Managing DataFlows for Batch and Streaming Data. StreamSets Data Collector provides Drag-and-drop connectors for batch and streaming sources and destination for Data Ingestion and provides Monitoring of the Data Pipeline and Error Detection.

 

Real Time Data Ingestion and Processing with change data capture (CDC) for Extraction Transformation and Loading in ETL application

 

In present Case, StreamSets used for Data Ingestion and CDC for Real-Time Tweets  from Twitter API's and Data Migration from MySQL to Data Pipeline using Kafka and Amazon Redshift

 

Business Challenge

 

  • To create a Real-Time Twitter Stream into Amazon Redshift Cluster. 

  • Build a data pipeline for the MySQL to migrate its data to MySQL.

  • Implementing Change Data Capture Mechanism to capture changes in any data source. 

  • Building a Data Pipeline to fetch Google Analytics Data and sending the stream to Amazon Redshift. 

 

Solution Offered

 

  • Data ingestion will be performed using StreamSet Data collector that streams data in real time.

  • For streaming data to Amazon Redshift, there are two ways - 

    • Using Connection Pool - In this, use JDBC producer as the destination and using the connection strings of redshift for connecting to redshift.

    • Using Kinesis Firehose Stream - In it, configure a Kinesis Firehose stream first which uses Amazon S3 bucket as the intermediate and uses the copy command to transfer data to Amazon Redshift Cluster.

Looking For More Details

Download Now

What are you doing?

Talk to Experts for Assessment on DevOps Intelligence, Big Data Engineering and Decision Science

Reach Us

Transforming to a Data-Driven Enterprise

Get in Touch with us for Artificial Intelligence Platform and Enterprise Analytics Solution

Contact Us

AI & Deep Learning
Consulting Services

  • Business Case Ideas
  • Data Veracity Assessment
  • Experience Design
  • Roadmap and Value Mapping
  • Technology Identification
Learn More