Home
>
Data Science
>
Update TileDB: Managing Big Data Storage in Multiple Dimensions

March 29, 2022 by Phu Nguyen

Update TileDB: Managing Big Data Storage in Multiple Dimensions

Main Contents:

TileDB: Managing Big Data Storage in Multiple Dimensions is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn TileDB: Managing Big Data Storage in Multiple Dimensions in today’s post !

Branching Out

Last year, Cambridge, Mass.-based TileDB was spun out of a big data collaboration between Intel Labs and the Massachusetts Institute of Technology. It announced a $1 million seed round of funding in October, led by Intel Capital and Nexus Venture Partners.

Edmon Begoli, chief data architect at the Oak Ridge National Laboratory said at the time: “I consider TileDB one of the most sophisticated, best-written solutions for high-performance scientific data management that I have seen in years.”

TileDB earlier collaborated with the Broad Institute on creating a version of TileDB called GenomicsDB. The Broad Institute stores terabytes of genomics data modeled as a huge sparse 2D array.

Open SoftENG Meeting: Stavros Papadopoulos, TileDB from ISTC Big Data on YouTube.

In a research paper, it touts being faster than the HDF5 dense array storage manager, the SciDB array database system with both dense and sparse arrays, and the Vertica relational column-store for dense arrays, and at least as fast for sparse arrays.

An array in TileDB is physically stored as a directory in the underlying file system.

Its key idea is to organize array elements into ordered collections called fragments. Each fragment is dense or sparse, and it groups related array elements into regular-sized chunks of fixed capacity, which it calls data tiles. Cells that are accessed together are co-located on the disk and in memory to minimize disk seeks, page reads, and cache misses.

This organization turns random writes into sequential writes and boosts read efficiency with its own algorithm.

Application needs determine the choice of global cell order: For example, if an application reads data a row at a time, data should be laid out in rows rather than a columnar layout.

In sparse arrays, the user specifies data tile capacity, then creates the data tiles so they all have the same number of non-empty cells, equal to the capacity.

Writes are performed in batches, which speeds up performance, and each batch is written to a separate fragment sequentially. Sparse fragments can be used to speed up random writes even in dense arrays.

The TileDB read algorithm more efficiently finds the most recently updated fragment and avoids unnecessary tile reads when a portion of a fragment is totally covered by a newer fragment. As performance degrades as the number of fragments grows, a consolidation algorithm goes to work in the background while other concurrent reads and writes continue.

Each chunk of data or tile is compressed using multiple different compressors depending on the nature of the data, reducing storage costs, Papadopoulos said.

It’s a C++ library, and offers APIs in C, C++ and Python. APIs in R, Java and other languages are in the works.

“[They’re] trying to come up with a Python solution and the R world is trying to come up with an R solution. What we’re saying is that we should be doing a universal solution to work out all the details – have an API for you, for Python, for R, for Java. You don’t have to worry about this, just use us and you can build your own fast analytics on top,” Papadopoulos said.

Beyond its work in genomics, TileDB is looking to branch out into other fields, such as Lidar data — three-dimensional points in space — and time-series data, used heavily in financial services.

So far, it’s all still open source. TileDB, the company, is working toward building an enterprise-ready commercial offering, the timing of which depends on its next round of funding, Papadopoulos said.

Feature image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update TileDB: Managing Big Data Storage in Multiple Dimensions

Read more about TileDB: Managing Big Data Storage in Multiple Dimensions at Wikipedia

Branching Out

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about TileDB: Managing Big Data Storage in Multiple Dimensions at Wikipedia

Branching Out

Get a custom Proposal

You need to enter your email to download

Blog post

Locations