Home
>
Data Science
>
Update Dremio Wants to Be the Splunk of Big Data

March 29, 2022 by Phu Nguyen

Update Dremio Wants to Be the Splunk of Big Data

Main Contents:

Dremio Wants to Be the Splunk of Big Data is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Dremio Wants to Be the Splunk of Big Data in today’s post !

Connecting Directly

Dremio connects to all of an organization’s data sources, data lakes, databases, taking care of everything in the middle. The Arrow-based execution engine leverages columnar-based memory to execute a query that runs on a single source or on data between different sources.

It also optimizes the data itself, similar to the way Google optimizes data in various data structures so that search queries can be very fast, Shiran said. It calls these data structures “Reflections.”

And it has a user interface much like Google Docs, except for data sets rather than documents. The users themselves can see the data and explore it. They can create new data sets by doing live data curation. They can interact with the data visually or through SQL.

“Everything under the hood is standard SQL, and more technical users can do anything in the power of SQL. You can create new data sets, share them with colleagues. There’s an entire data catalog in there.

“Then with a click of a button, you can launch any of these BI tools, connect it to the Dremio cluster already and start playing with the data inside Tableau without extracting any data. There are no copies of data. All the data sets and curation inside Dremio are all virtual. It’s all done at the logical layer. All the current solutions are based on data copies, and Dremio is the opposite of that,” he said.

Because the major BI tools are based on SQL, Dremio forms a bridge between NoSQL databases such as MongoDB, automatically learning the implicit schema from various systems even when they don’t have an original schema.

“It’s kind of what Splunk did with logs,” Shiran explained. “It wasn’t that people weren’t analyzing logs before, but they were using a lot of command-line tools and loading logs into relational databases — it was just a lot of manual work. Splunk designed a solution specifically for log analytics and made it so you don’t have to glue together all these tools in order to analyze your logs.”

Standard SQL

Dremio is designed to scale from one server to thousands of servers in a single cluster. It can be deployed on Hadoop or on dedicated hardware. With Hadoop, it recommends deploying Dremio on the Hadoop cluster so raw data is local in the cache.

There are two roles in the Dremio cluster:

Coordinators that coordinate query execution, managing metadata and managing the UI.
Executors, which process queries.

By deploying coordinators on edge nodes, external applications such as BI tools can connect to them. Coordinators use YARN to provision the compute capacity to the cluster, eliminating that need for manual deployment. The company recommends one executor on each Hadoop node in the cluster.

Dremio, in effect, is an extension of their open source work. Drill is a single SQL engine that can query and join data from myriad systems. Dremio uses Apache Arrow (columnar in memory) and Apache Parquet (columnar on disk) for high-performance columnar storage and execution.

Dremio looks like a single, high-performance relational database to any tool. You just send standard SQL queries. Meanwhile, Dremio automatically optimizes the physical organization of your data for different workloads in a cache, or it queries your data sources directly when you need access to live datasets.

It uses a persistent cache that can live on HDFS, MapR-FS, cloud storage such as S3, or direct-attached storage (DAS). The cache size can exceed that of physical memory, an architecture that enables Dremio to cache more data at a lower cost, producing a higher cache hit ratio compared to traditional memory-only architectures, according to the company.

It also offers native query push downs. Instead of performing full table scans for all queries, Dremio optimizes processing into underlying data sources. Dremio rewrites SQL in the native query language of each data source, such as Elasticsearch, MongoDB, and HBase, and optimizes processing for file systems such as Amazon S3 and HDFS.

Its Data Graph preserves a complete view of the flow of data. Companies have full visibility into how data is accessed, transformed, joined, and shared across all sources and all analytical environments.

Open Source Model

Dremio comes in an open source Community edition and an Enterprise edition. The Enterprise edition includes connectivity to enterprise data sources such as IBM DB2, as well as security and governance capabilities.

It can run on-premises or in the cloud. There are advantages to running Dremio in the cloud, such as you can store those reflections, the optimized data stores, on S3 directly, Shiran said.

“It’s a fully managed cache and you can scale your compute capacity independent of that. Say after a Black Friday, you need more analytics capacity, you spin up a few more Dremio instances, and you spin in down when you don’t need it,” he said.

Feature Image: Dremio co-founders Jacques Nadeau (right) and Tomer Shiran (Dremio).

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Dremio.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Dremio Wants to Be the Splunk of Big Data

Read more about Dremio Wants to Be the Splunk of Big Data at Wikipedia

Connecting Directly

Standard SQL

Open Source Model

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about Dremio Wants to Be the Splunk of Big Data at Wikipedia

Connecting Directly

Standard SQL

Open Source Model

Get a custom Proposal

You need to enter your email to download

Blog post

Locations