Update Spark Closes in on Real-Time Processing with Redis Pairing

Main Contents:

Spark Closes in on Real-Time Processing with Redis Pairing is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Spark Closes in on Real-Time Processing with Redis Pairing in today’s post !

Read more about Spark Closes in on Real-Time Processing with Redis Pairing at Wikipedia

You can find content about Spark Closes in on Real-Time Processing with Redis Pairing from the Wikipedia website

Redis Labs has released a connector that would allow the Spark data processing platform to use the Redis in-memory data store.

Using Redis for Spark will allow users to “store a huge amount of data without paying a significant amount of money for infrastructure,” explained Yiftach Shoolman, co-founder and Chief Technology Officer of Redis Labs, noting that Redis can be a lower cost alternative to a full-fledged in-memory database system. “Today we want the big data performance to be as close to real-time as possible. That is what we try to do.”

Specifically, the open source Spark-Redis connector package provides an easy way to run SparkSQL queries against data stored on Redis.

Running Spark against a Redis data store can speed processing by 135 times, compared to using HDFS (Hadoop File System) and is even 45 times faster than using the Tachyon in-memory data store, according to benchmarks from Redis Labs.

Redis

Redis Labs is eager to make Redis the de-facto data store for Spark, Shoolman asserted.

The package is a library that provides a library for writing to and reading from a Redis cluster. It exposes all of Redis’ data structures – string, hash, list, set, sorted set, bitmaps, hyperloglogs – as Spark RDDs (Resilient Data Sets) or through the Spark DataSet API.

The library minimizes the overhead that occurs with serialization and deserialization of large amounts of data.

Spark itself has emerged as the chief successor to the Hadoop data processing platform thanks in no small part to an ability to process data in near-real time, rather than the batch processing of ‘big data’ that Hadoop originally offered.

“Apache Spark is becoming a default in-memory engine for high-performance data integration and analytics,” said Matt Aslett, research director, data platforms and analytics at 451 Research, in a statement. “The combination of Redis and Spark should enable high-performance, real-time analytics with extremely large and variable datasets.”

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.