Home
>
Data Science
>
Update For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island

March 29, 2022 by Phu Nguyen

Update For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island

Main Contents:

For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island in today’s post !

Read more about For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island at Wikipedia

You can find content about For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island from the Wikipedia website

Much like Gilligan and crew, Big Data needs to get off the island, according to Sean Suchter, co-founder and CTO of Pepperdata.

Formerly at Yahoo, one of the first users of Hadoop, he saw this whole Big Data stack — HDFS, MapReduce, YARN — developed to deal with problems of scale.

“Deliberately and by design, it was separate from everything else that IT could do,” he explained.

That led to this disconnect: All the things you could do with mainstream IT — all the tools, log management, storage systems — were totally inaccessible to the Big Data universe.

“So you got what I call the mainland and the island. The mainland of mainstream IT and the island of Big Data. It doesn’t share technologies, it doesn’t share hardware, it doesn’t share people or expertise, it doesn’t share tools. And it’s been that way for more than a decade. You don’t get to take advantage of advancements of one or the other,” he said.

He sees Spark running on Kubernetes, an open source container orchestration tool managed by The Cloud Native Computing Foundation, as a means to achieve this.

The company maintains that Spark is orders of magnitude faster than MapReduce, easier to code, and more flexible.

As Pepperdata, which helps customers solve Big Data issues with Hadoop and Spark, began investigating running Spark and HDFS natively on Kubernetes, it found a community of companies — Google, Red Hat, Palantir, Bloomberg to name a few — working on the same issues. The Spark on Kubernetes Special Interest Group was formed as a fork of Spark. Kubernetes is expected to become core to Spark in the next release, due out in a few months.

That will give users a fourth ways to run Spark beyond standalone, YARN and Mesos.

While Kubernetes 1.8 added native support for Spark, it’s taking more work to make Spark fully speak Kubernetes, he said.

“Kubernetes already gives you all this flexibility where you can describe your pods, you have daemon sets, replica controllers, and primitives. We got a basic version working using primitives that already existed,” he said.

At Spark Summit 2017, Google software engineer Anirudh Ramanathan explained that running the two together gives operators less infrastructure to manage, it gives developers a single interface to manage all their workloads, it improves infrastructure utilization, and the huge Kubernetes ecosystem adds a host of services that can be immediately available to spark users, such as the recently launched Istio service mesh project.

Open Apache Spark on Kubernetes – Anirudh Ramanathan & Tim Chen on YouTube.

Two big areas of work have been on security and scheduling.

Kubernetes has security primitives, but it doesn’t really extend to all the arbitrary users within an enterprise, he said, so there were extensions to make it work the way that Big Data systems with network authentication protocol Kerberos do.

It supports Kerberos-based authentication to secure access to the overall environment and to protect credentials used to access applications.

There’s also been work to deal with scheduling the ever-changing workload of ephemeral microservices.

Open HDFS on Kubernetes—Lessons Learned – Kimoon Kim on YouTube.

The project includes a data locality function to make it faster to access data across distributed instances of HDFS on Kubernetes. It would allow users to manage all the silos where data resides, regardless of whether they are deployed on-premises or in a cloud.

“There will be Helm charts so users can set up storage on their Kubernetes system and use that as secure Big Data store. Secured and high performance. That’s been one of the traditional problems with using cluster fabric on bare metal that Big Data systems are really resource hungry. The powerful thing about Kubernetes is the abstraction to set these things up in a way that can have the same kind of performance you get out of bare metal. With most [Big Data] systems, that wasn’t really true,” he said.

Getting off the island will mean companies can just run another project on their Kubernetes cluster without a clunky, multiple system architecture. He predicts that within two years a company could run analytics — a machine learning project that feeds back into a user-facing application, for instance — all in one system.

Focus on Spark

Pepperdata has intensified its focus on Spark. Features like easy integration, built-in machine learning and support for streaming data are driving the boost in Spark adoption, according to experts in its “Production Spark” webinar series.

The company recently announced Code Analyzer for Spark, which gives developers the ability to connect performance issues to the blocks of code causing the problem.

In March, it released Application Profiler, a software-as-a-service version of LinkedIn’s Dr. Elephant, the open source tool that helps users of Hadoop and Spark analyze and improve the performance of their flows.

One new option for unifying Big Data with other IT infrastructure comes with SAP Vora, an in-memory, computing engine for HANA that runs on Red Hat’s Kubernetes-based OpenShift Container Platform. It gathers data from Spark, Hadoop or directly from cloud environments.

At the same time, it seems Hadoop is falling out of favor.

Earlier this year, Gartner declared Hadoop obsolete, citing the complexity and questionable usefulness of the entire Hadoop stack. It noted many organizations are instead looking at cloud-based options with on-demand pricing and fit-for-purpose data processing.

Cloudera removed Hadoop from the formerly named Strata + Hadoop World conference.

At ApacheCon North America in May, Cloudera’s Daniel Templeton recommended waiting for Hadoop 3.0 before deploying Docker containers, citing security and other issues.

Hortonworks similarly renamed its Hadoop Summit line of conferences last year to DataWorks Summit to reflect the greater role of data-streaming architectures.

The Cloud Native Computing Foundation, Google, Red Hat are sponsors of InApps Technology.

Feature Image: “Island” by driver Photographer, licensed under CC BY-SA 2.0.

InApps Technology is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Island, Docker.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island

Read more about For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island at Wikipedia

Focus on Spark

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about For Pepperdata, Spark-on-Kubernetes Is the Ticket off of Big Data Island at Wikipedia

Focus on Spark

Get a custom Proposal

You need to enter your email to download

Blog post

Locations