Update Apache Flink Gets Real with Continuous Stream Processing

Main Contents:

Apache Flink Gets Real with Continuous Stream Processing is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Apache Flink Gets Real with Continuous Stream Processing in today’s post !

Easing Complexity

“If you look at traditional architecture, people have a continuous flow of data, and they ingest into a database or into Hadoop for analytics. It becomes increasingly hard to keep up with the volume,” Tzoumas said.

It’s building out the capability to query aggregates inside Flink before it even ends up in a database, so in many cases, a database wouldn’t be required.

“What this means for data management is this transition from a batch-based approach to this streaming approach. There is very clearly a need for a compute platform in this space, for a stream processor that can handle these loads with low latency with a good match for this architecture,” he said.

It’s a good match for any high-volume data stream, such that of telcos, the financial sector, mobile gaming and especially for Internet of Things applications. One use for streaming data would be for keeping track of a set of microservices. Instead of feeding all state back to a database, capture all the events in a single process stream.

“Typical workflows being built with Hadoop and its tools to have a continuous ingression pipeline that drives the file systems, then to schedule Hive jobs or Spark jobs to access those files, then pretend you have continuous output,” he said.

And with Storm, for instance, you’re using batch groups together augmented by a stream processor for real-time results, though those results have to be corrected down the line with a batch processor, he says.

Flink’s focus is on a 24/7 stream processor with a simple-to-use API with which you can build applications that require accuracy in the stream processor.

Based on Database Tech

Flink, German for “quick” or “nimble,” grew out of Berlin’s Technical University in 2009 under the name Stratosphere. It originated as a research project aimed at combining the best technologies of MapReduce-based systems and parallel database systems, then turned its focus to data streaming.

Stephan Ewen and Kostas Tzoumas founded the commercial company data Artisans around the open source project in 2014, the same year it joined the Apache Incubator renamed as Flink.

The software includes libraries for complex event processing, machine learning, and graph processing. It touts being able to perform batch and streaming with a single runtime.

Flink uses a JVM-based data processing system that operates on serialized binary data, as do other Apache projects including Drill, Ignite and Geode. It boasts of a highly efficient garbage collection as a means to boost performance; a highly efficient data de/serialization stack that facilitates operations on binary data and makes more data fit into memory; and DBMS-style operators that operate natively on binary data yielding high-performance in-memory and with overflow sent to disk as necessary.

Flink can be processed only the parts of the data that have actually changed, significantly speeding up the job. And Flink users also can run Storm topologies to transition between the two.

It has APIs in Java and Scala in v1.0 and a beta version for Python. It runs on YARN and HDFS and has a Hadoop compatibility package. It uses RocksDB, an open source key-value store developed by Facebook, for any state data that needs to be stored.

Going forward, the company is building out SQL capabilities, writing more libraries, improving compatibility with Kafka and HDFS, for databases such as Cassandra, and building out building support for Mesos.

The 11-person Berlin-based data Artisans, including one person in San Francisco, has been focused on expanding its community rather than monetizing the project, Tzoumas said. It has more than 160 contributors.

Flink became a top-level project in January 2015, and version 1.0 was released last month which should help ease doubts about its production-readiness. The company also recently raised €5.5 million (US $6 million) in a Series A round led by Intel Capital.

Though it trails in development projects such as Spark and Storm, Intel’s Ron Kasabian, vice president in the Data Center Group and general manager of Big Data Solutions, called it “one of the most advanced and transformative stream processing systems available in the open source community.”

Capital One and Intel are sponsors of InApps.

Feature Image: “200+ mph jet stream screams over Pacific Northwest” by Peter Stevens, licensed under CC BY-SA 2.0.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.