Overview
SKACK Stack is an open source Full-Stack platform for Real-Time analysis of Big Data.It consists of Apache Spark, Kubernetes, Akka, Apache Cassandra and Apache Kafka.
We selected GCP & GlusterFS as a storage solution as it supports multi-mount and data remains on all nodes of GlusterFS & GCP.
Business Challenge
Goals of this project included-
-
Setting up a multi-node cluster for SKACK Stack with a document on Kubernetes.
-
Container environment is not persistent by default, so application in Kubernetes needs Persistent storage to store data.
-
Using Kubernetes to scale up Spark.
-
Using Kubernetes to scale up Cassandra
-
Using Kubernetes to scale up Kafka
Solution Offered
To overcome the challenges mentioned above, we set up a three-node on premises Kubernetes cluster in which one will as a master and the other two workers. It includes-
-
Kubernetes Master
-
Kubernetes Scheduler
-
Kubernetes Controller Manager
We used this set up for analyzing the cluster and reporting to the API server to store metrics which contains resource utilization, availability, and performance.