Qubole Harnesses Automation to Provide Data Self-Service – InApps 2025

Main Contents:

Qubole Harnesses Automation to Provide Data Self-Service – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Qubole Harnesses Automation to Provide Data Self-Service – InApps in today’s post !

Key Summary

This article from InApps Technology, authored by Phu Nguyen, highlights Qubole, a Santa Clara-based startup founded in 2011 by Ashish Thusoo and Joydeep Sen Sarma (creators of Apache Hive). Qubole’s autonomous data platform, the Qubole Data Service (QDS), leverages automation to enable self-service data analytics, addressing inefficiencies seen at Facebook. Key points include:

Origin and Mission:
- Background: Stemming from Facebook’s data team frustrations, where slow data access led to decisions without data, Qubole was launched to make organizations data-driven through self-service.
- DataOps Advocacy: Thusoo and Sarma, authors of Creating a Data-Driven Enterprise with DataOps (O’Reilly Media), promote a DataOps approach, akin to DevOps, for data teams.
- Launch: Founded in 2011, first product released in 2013.
Qubole Data Service (QDS):
- Overview: A managed platform supporting open-source engines like Hadoop, Spark, Hive, Presto, Airflow, and Pig, available in Community and Enterprise editions (announced May 2022).
- Cloud Support: Runs on AWS, Microsoft Azure, Oracle Bare Metal Cloud, and Google Cloud Platform (Spark added in January 2022).
- Key Feature: Decouples compute and storage for scalability, allowing data scientists to spin up hundreds of clusters and run ad hoc/batch queries in under 5 minutes.
Automation Levels:
- Infrastructure Management:
  - Workload-Aware Auto-Scaling Agent: Dynamically adjusts cluster size based on workload, supporting heterogeneous clusters with varied machine profiles.
  - Spot Shopper Agent (AWS-only): Optimizes compute instances for cost and performance.
- Data Management:
  - Data Caching Agent: Optimizes data placement for fast access, moving less-used data to cost-effective storage.
  - Guides users to structure datasets for efficiency and reuse existing data.
- Workload Management: Automates task orchestration, such as joining datasets or reusing workload outputs.
Additional Features:
- Notebooks-as-a-Service and SQL Workbench-as-a-Service: Simplifies data exploration and querying.
- API Connectors: Integrates with tools like Tableau for enhanced analytics.
- Usage: Processes 750PB of data monthly, supporting use cases like Spark for machine learning, Hive for ETL, and Presto/Hive for log analysis.
Competitive Differentiation:
- Multi-Cloud Flexibility: Runs consistently across AWS, Azure, Google Cloud, and Oracle, without retooling workloads.
- Broad Accessibility: Serves analysts, data scientists, and developers on a single platform.
- Cost Efficiency: Drives higher hardware utilization compared to competitors like Elastic and MapReduce.
- Automation and Self-Service: Offers superior automation, reducing manual overhead.
Customers: Includes Pinterest, Lyft, and Under Armour.
InApps Insight:
- InApps Technology aligns with Qubole’s automation-driven DataOps, leveraging Microsoft’s Power Platform and Azure, using Power Fx for low-code data tools and Azure Durable Functions for scalable workflows.
- Integrates Node.js, Vue.js, GraphQL APIs (e.g., Apollo), and Azure to deliver efficient data solutions, targeting startups and enterprises with Millennial-driven expectations.

Read more about Qubole Harnesses Automation to Provide Data Self-Service – InApps at Wikipedia

You can find content about Qubole Harnesses Automation to Provide Data Self-Service – InApps from the Wikipedia website

Data delayed is data denied.

That was a slogan from the team that built out a self-service data platform at Facebook. Employees’ frustration at having to go through a data team to get the information they needed has been channeled into Qubole, a Santa Clara, Calif.-based startup that was generating buzz at ApacheCon recently for its focus on automation.

Facebook put data at the heart of everything it did, according to Ashish Thusoo, a member of that team, and now CEO and co-founder of Qubole.

Yet “it got to where if it was too painful to get the data, they went ahead without it — they wanted to move very, very fast,” he said. “It was a bad architecture for Facebook, and really, it’s bad for any company. … We took a step back and said, ‘This kind of architecture is very important for any organization that wants to become data-driven.’” he said.

Thusoo and Joydeep Sen Sarma — they also created what is now Apache Hive — launched Quoble in 2011 and released their first product in 2013. Qubole aims to manage infrastructure, allowing data teams to focus on analyzing and using the data.

Advocates of a trend similar to DevOps taking place among those who operate and use data technology, Thusoo and Sarma have also written a book on the subject, “Creating a Data-Driven Enterprise with DataOps,” published by O’Reilly Media.

At its core, Qubole offers managed Hadoop, Spark, Hive and Presto, as well as other open source engines such as Airflow and Pig.

In May, the company announced what it calls its “autonomous data platform”: its Qubole Data Service (QDS) as community and enterprise editions.

The platform, the company claims, self-manages, self-optimizes and learns from your usage to run in the most efficient and economical way. It runs on Amazon Web Services, Microsoft Azure and Oracle Bare Metal Cloud. In addition, the company added Spark on Google Cloud Platform in January.

With Qubole, a data scientist can spin up hundreds of clusters on their chosen public cloud, have the system autoscale to the optimal compute levels as needed and begin creating ad hoc and/or batch queries in less than five minutes, according to the company.

The cloud enables the decoupling of compute and storage, which is key to its architecture.

It automates on three three levels: infrastructure management, data management and workload management.

It uses application-aware autoscaling algorithms that look at the workloads coming in and create the infrastructure, including heterogeneous clusters, clusters with different machine profiles and more.

It provides information on how to best structure their datasets to reduce the time it takes to get answers and increase infrastructure efficiency. And it guides users in managing workloads such as by reusing data from existing workloads or joining data from datasets.

It also offers notebooks-as-a-service and SQL workbench-as-a-service as well as API connectors to other data tools like Tableau.

The company also issued a new set of agents, including:

Workload-Aware Auto-Scaling Agent, which optimizes cluster size precisely to workload requirements and dynamically scales based on actual processing load.
Spot Shopper Agent (AWS Only), which shops across AWS cloud to assemble the compute instances in the optimal combination of performance and cost.
Data Caching Agent, which optimizes the location of your data for fast, interactive access speeds. Data accessed less frequently is intelligently moved in the background for the best performance.

Its customers include Pinterest, Lyft and Under Armour.

Elastic and MapReduce often are considered Qubole competitors, but Thusoo says its differentiates in several ways:

Choice: The same platform runs on Azure, on AWS on Oracle Cloud, on Google. If customers want to build out a workload on multiple clouds or if they want different workloads in different clouds, workloads don’t have to be retooled.
Analysts, data scientists, and developers all can use the same platform. Competitors don’t offer the level of automation or self-service as Qubole, he said.
It provides this access at a fraction of the cost by driving hardware utilization more efficiently.

Users on a single account use multiple of its as-a-service offerings for different use cases, such as Spark for machine learning, Hive on ETL, Presto and Hive on log analysis. It’s processing 750PB of data a month, Thusoo said.

“Our value proposition is that from a single platform, they can subsume all these use cases,” he said.

Feature Image: “#43 Stripes” by Trevor King, licensed under CC BY-SA 2.0.

Source: InApps.net

Rate this post

Anh Hoang

Anh Hoang is Head of SEO Optimization at InApps Technology, ensuring that the message and research of InApps Technology reach the most people possible while adhering to our strict journalistic standards of excellence and integrity.