SQL Query Platform with Apache Hive, Presto

Overview

Distributed SQL Query Engine Presto enables us for running interactive analytic queries and Hive enables for running batch processing against data sources of all sizes ranging from gigabytes to petabytes. Presto allows querying data in Hive MetaStore. Hive is optimized for query throughput, while Presto is optimized for latency

Hive have Pull Data Processing Modelling whereas Presto has Push Data Processing Models like traditional DBMS Implementations. Presto has Memory Limitation for Query Tasks and Running Daily /Weekly Reports Queries Required a Large Amount of Memory, for which Hive is Best.

Infrastructure Automation Using Ansible and Terraform for Auto Launching, Auto Scaling and Auto Healing of the Presto Cluster and Hive using AWS On-Demand EC2 and AWS Spot Instances.

Problem Statement

Client Looking to build Data Processing & Query Platform and Cluster Management for their organization

The Customer had large DataSets on Remote Storage and want to use Presto for Data Discovery and Apache Hive, Tez For ETL Jobs.

Presently, using AWS Cloud but looking to do Infrastructure Automation for Cluster Management and Deployment for Presto and Hive using AWS Spot Instances.

Solution Offered

We offered Solution for Data Processing & Query Platform with Infrastructure Automation -

Greatly simplifies, speeds up and scales Big Data Analytics workloads.

It processes your data from external storage using fast execution engines like Presto and Hive.

Run large and complex queries.

Cost effective as it uses AWS spot instances as default and heals the cluster if cluster scale is smaller than the minimum cluster size.

It automatically scales up and down the cluster according to the CPU load.

What are you doing?

Talk to Experts for Assessment on DevOps Intelligence, Big Data Engineering and Decision Science

Reach Us

Transforming to a Data-Driven Enterprise

Get in Touch with us for Artificial Intelligence Platform and Enterprise Analytics Solution

SQL Query Platform with Apache Hive, Presto for Large Data Processing

Overview

Problem Statement

Solution Offered

Looking For More Details

Category

Technologies

What are you doing?

Transforming to a Data-Driven Enterprise

DevOps Strategy & Best Practises