Update Presto’s New Foundation Signals Growth for the Big Data SQL Engine

Main Contents:

Presto’s New Foundation Signals Growth for the Big Data SQL Engine is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Presto’s New Foundation Signals Growth for the Big Data SQL Engine in today’s post !

Speed, Concurrency

Java-based Presto was created to be a faster Hive, which was used to pull SQL queries against Hadoop. However, Presto also could query across Hive and MySQL, of which Facebook also was a large-scale user.

Since being open sourced in 2013, it now can query across data sources both on-prem and cloud repositories including HDFS, Amazon S3, Kafka, Cassandra, Postgres, Oracle and Redis.

Presto, which can run full ANSI SQL, is designed for high-performance, high concurrency and low latency. Airbnb, Netflix, Treasure Data and Uber were early adopters and contributors, and among the companies running hundreds of nodes against petabytes of data.

Part of the performance gain comes from not using MapReduce, which writes results back to disk. Instead, Presto compiles parts of the query on the fly and does processing in memory, which comes with limited fault tolerance, Treasure Data warns.

https://www.youtube.com/watch?v=2J7Amu1UtsU

Open Presto Fast SQL on Anything on YouTube.

Presto separates compute and storage.

“Presto thinks of databases or other places you store data as simply storage, rather than being its own database,” Borgman said.

One of Starburst’s clients, a media company, stores viewing data in Hadoop and billing data in Teradata. Presto sees those as just two places data is stored, and as an abstraction layer above these different data sources, allows users to query those and join across them, Borgman said.

“It could be Hadoop, a traditional database like Oracle or Teradata or moving to the cloud and query data in S3,” he said.

“I think over time S3 is becoming the new data lake. It has been Hadoop, but now S3 or the equivalent — Blob Storage on Microsoft, Google Cloud Storage on Google — these are becoming the low-cost place to let your data live. And if you can query the data there without having to load it into some other platform, that’s going to save you time and money. I think that’s a big motivator for why Presto has taken off.”

Presto works with any flavor of Hadoop — or without it. Kubernetes could further simplify things with Presto and other data technologies, Iguazio Chief Technology Officer Yaron Haviv recently wrote for InApps Technology.

Like other relational database systems, it can be virtualized as one coordinator node working in sync with multiple worker nodes. Its metadata API, data location API and data stream API enable it to connect across multiple storage sources.

Though these APIs, it asks the data source for the list of tables or columns, data types and the location of the data so it can be assigned to workers that will execute the work in parallel.

Visualization and other SQL tools can be added on top of it.

Its built-in functionality includes support for regular expression functions, lambda expressions and functions and geospatial functions. It can handle complex data types including JSON, array, map and row/struct.

Much of Starburst’s work these days is helping customers use Presto across a mixture of on-prem and cloud data, Borgman said.

He pointed to Snowflake as its closest competitor on cloud data, but users have to load data into it in its own proprietary format. Presto queries data directly in open source format, he said.

AWS also makes Presto, the project, available on its platform as part of its EMR (Elastic Map Reduce) offering.

The most recent developments in Presto have included a cost-based optimizer, which uses things like properties like CPU cost, memory requirements and network bandwidth usage to determine how queries can be executed as fast as possible with the least resources, and role-based access controls and security enhancements. It’s available now on Azure as well as AWS.

Feature Image: “King’s Cross Station Facelift – Jan 2014 – Waiting for a Train” by Gareth Williams. Licensed under CC BY-SA 2.0.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.