Update Kinetica Brings the Power of GPU Parallel Processing to a Database System

Main Contents:

Kinetica Brings the Power of GPU Parallel Processing to a Database System is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Kinetica Brings the Power of GPU Parallel Processing to a Database System in today’s post !

Falling in

That’s the dream situation when your clients are the Army and the National Security Agency. There, Vij and colleague Nima Negahban began building interim solutions around SAP, Oracle, and IBM Netezza database warehouses. What their clients needed was a system that could evaluate data in real-time and execute queries in the background — analytical queries whose criteria were arriving in the system in real-time as well. It was a concept for which these commercial products were evidently not well-suited.

Nor did open source provide an answer. “If you look to the open source community — Hadoop, and all these different flavors of NoSQL,” explained Vij, “they’re all batch-oriented. And it’s more marketing, rather than actually having a product that works.”

Vij told us the story of how his development team did produce a complete, working, big data-style, open source database stack for their military clients (who were joined, by this time, by the U.S. Postal Service). But no matter how they scaled their applications, these logistics-heavy agencies were never able to achieve real-time, deterministic performance. Put another way, imagine watching a movie whose frames were successfully synchronized, except several of them were taken out of sequence. Sure, the frames show up in time, but for deterministic applications, those missing frames are unfillable voids.

CUDA, the leading GPU acceleration library, and the compiler infrastructure that produces object code using that library are today open source projects. That is to say, Nvidia has released the technologies they developed proprietarily into the open source community, and companies such as IBM have released the proprietary technology they had developed around those assets, also into open source. That’s nowhere near the same situation as a Linux Foundation project, for example, where a balance of developers from multiple firms mutually build a concept, from the time their respective employers are paying for, that’s free and open, to begin with.

Falling apart

Making the NoSQL and Cassandra stack address the needs of the NSA and USPS, Vij told us, “was like duct-taping five, ten different projects, that are loosely coupled, on different release cycles, not really meant to work with one another in a synergistic way. How can these technologies process data in real time? We took a totally different approach to it. We created a database from the ground up, with multicore devices in mind, and we aligned the data to the thousands of cores of GPUs.”

In 2013, AMD’s position as a genuine challenger in the CPU space was waning, and IBM realized it could not compete against Intel all by its lonesome. It established a strategic alliance now known as the OpenPower Foundation, whose members collectively contribute to the systems architecture that originally belonged to IBM — whose initial intent was to scale up to mainframes and scale down to Sony’s PlayStation 3 — as a collective project.

Through OpenPower, both IBM and Nvidia have championed a hardware architecture that facilitates GPU acceleration by design. This technology has forced Intel, with its venerable x86 architecture still going strong, to answer the call for faster acceleration than software alone can provide — for instance, by acquiring FPGA accelerator maker Altera in June 2015 for over $16 billion.

OpenPower’s efforts have also helped bring to fruition an open standard, stewarded by IBM, called the Coherent Accelerator Processor Interface. It’s a faster expansion bus than PCI-Express, upon which many GPUs rely today, and it may apply across hardware architectures including OpenPower and x86.

Falling out

The reason all this matters to little ol’ us in the scalable software space is this: A wider expansion bus will pave a new, multi-lane superhighway for a class of GPUs that’s been waiting in the wings for some time, waiting for the traffic bottleneck to break apart. This new class of accelerators will enable an entirely different — though not officially new — database system architecture to emerge, one that utilizes a broader path to memory, that runs parallel operations orders of magnitude faster than Hadoop or Spark on CPUs alone, and most importantly of all, does not require continual indexing.

Ay, there’s the rub. It’s indexing, contends Kinetica, that makes today’s databases so slow, and the reason these databases need all this indexing is because they’re bound to storage volumes. HDFS made those volumes more vast, and thus opened up the world for big data, but the work being done under the hood there is phenomenal. If those volumes did not have to exist, a huge chunk of the busy work that traditional databases perform today would disappear.

“These NoSQL databases are forcing organizations to redesign their data models,” argued Kinetica’s Vij. “Organizations have, for decades, relied on relational databases as primitives and tables. Moving them to key/value stores takes months or years to do. NoSQL databases right now… cannot provide real-time analytics, as they’re so reliant upon indexing and delta-indexing.”

The typical solution that NoSQL database engineers suggest involves optimization for the queries they tend to run. But that only works if — unlike the case with the NSA, which literally doesn’t know what it’s looking for until it finds it — you know what the queries will be.

“We enable organizations to do correlations on the fly and at the time of query,” the CEO continued, “and do sub-queries, and chain those together. Whereas for traditional databases and NoSQL databases, they are engineering their data schemas and models where they know what those queries are, and they optimize for that. If you don’t know what you’re going to query, that’s a non-starter for [the military].”

What Amit Vij is suggesting is that Kinetica, and GPU-accelerated or FPGA-accelerated platforms like it, are starting to enable containerized applications to address both the big data problem from a decade ago, and the real-time database problem of today, in a manner that was not possible when the current open source database stack was being conceived back then.

Monday at the GPU Technology Conference, we’re likely to see more new examples of this assertion, as analytics products maker Fuzzy Logix launches its partnership with Kinetica. That partnership should lead to the development of real-time financial projection and risk analysis applications, of the scope relegated exclusively to the supercomputing space just a few years ago.

“Databases and analytics engines need to start leveraging the hardware of today,” said Kinetica’s Vij. “GPUs are following Moore’s Law, where the software is not.”

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.