Google has announced that it’s making the Cloud Dataflow SDK open source.
Cloud Dataflow, which it describes as “a platform to democratize large-scale data processing by enabling easier and more scalable access to data,” was just unveiled in June. It’s still an alpha release, but used internally in the company, Google says.
It’s aimed at relieving operational burden on developers while providing more scalable access to data for data scientists, data analysts and others.
The SDK will “make it easier for developers to integrate with our managed service while also forming the basis for porting Cloud Dataflow to other languages and execution environments.” It’s also looking to spur future innovation in combining stream- and batch-based processing models.
Cloud Dataflow evolved from MapReduce and technologies such as Flume and MillWheel. The underlying service is language-agnostic, though this SDK is in Java.
Google wants to get more developers on board with its own approach, which is to process data as it comes in rather than in batches, and to bolster its Google Cloud Platform, competes with the likes of Amazon Web Services and Microsoft Azure, according to VentureBeat.
Cloud Dataflow is similar to Amazon Kinesis, which aims to reduce barriers between complex real-time data processing and inject streams in applications. It also can be used as an ETL alternative to prepare the data for processing in BI systems, according to Cloud Times.
Feature image via Flickr Creative Commons.