SKACK Stack is an open source Full-Stack platform for Real-Time analysis of Big Data.It consists of Apache Spark, Kubernetes, Akka, Apache Cassandra and Apache Kafka.
We selected GCP & GlusterFS as a storage solution as it supports multi-mount and data remains on all nodes of GlusterFS & GCP.
Business Challenge
Goals of this project included-
Setting up a multi-node cluster for SKACK Stack with a document on Kubernetes.
Container environment is not persistent by default, so application in Kubernetes needs Persistent storage to store data.
Using Kubernetes to scale up Spark.
Using Kubernetes to scale up Cassandra
Using Kubernetes to scale up Kafka
Solution Offered
To overcome the challenges mentioned above, we set up a three-node on premises Kubernetes cluster in which one will as a master and the other two workers. It includes-
Kubernetes Master
Kubernetes Scheduler
Kubernetes Controller Manager
We used this set up for analyzing the cluster and reporting to the API server to store metrics which contains resource utilization, availability, and performance.