Lightbend sponsored this post.

Mark Brewer
Mark Brewer is the President and CEO of Lightbend. Previously he served as Vice President of Business Operations for VMware’s Cloud Application Platform, where he helped build out their cloud application platform portfolio. Mark has worked with more than 10 open source technology companies as an operational leader or board member including companies like SpringSource, Covalent, ActiveState, LucidWorks and Stormpath.

For the first time, any application can take advantage of data still moving from its source to a data store, not even written to disk. That means the business doesn’t have to wait for a query. It can actually process that data while it’s still moving.

That’s a technology breakthrough that is introducing a whole host of new frameworks and programming abstractions in the application stack, and one that is rewriting the skillsets a modern developer team must have to face the real-time future.

Today InApps and Lightbend released the findings of a 2019 Streaming Data survey. Through the lens of more than 800 developers, we got a fresh look at the drivers for real-time data, and the barriers for developing and managing applications on streaming data infrastructure. I wanted to share some high-level takeaways you need to know, and hope you can check out the full survey findings.

Read More:   Update Pachyderm Challenges Hadoop with Containerized Data Lakes

AI/ML Drive Streaming Data into the Application Stack

One of the most remarkable drivers of streaming data revealed in the survey was a more than five-fold increase in AI/ML adoption over just a two year period. Companies processing data in real-time for AI/ML jumped from 6% in 2017 to 33% in 2019.

IBM chief Ginni Rometty famously predicted that “AI will change 100 percent of jobs.” Whatever the timetable on that prediction coming true, it’s obvious that today companies are placing large bets on AI and ML models to drive new opportunities for the business.

Streaming data is how you take datasets from different sources and get them into the model to come up with suggestions or answers. Whether that’s a clickstream from users on a web page for decision support, or any other AI/ML use case, the value of that data is obviously highest the closer it is to real-time.

Streaming Data Use in IoT Is also up 3x in Just 2 Years

The survey also showed that with this group of developer respondents, adoption of IoT pipelines more than tripled from 5% in 2017 to 16% in 2019.

From my point of view, we’re still at the infancy of industrial IoT and what will be an enormous market. But we’re also very early in the home, where today there are a number of one-off smart devices, but not much interconnectivity between them. I believe we’re two to five years away from your whole home being a mesh of devices that interoperate.

It’s worth noting that enthusiasm among adopters of IoT pipelines is dramatic, with 48% of those already incorporating IoT data saying this use case will see some of the biggest near-term growth.

Finer Points of Scale and Failure Management Matter a Lot in Streaming Systems

Containers and Kubernetes would still be viable if it weren’t for streaming workloads. But streaming and its use cases would be very hard to serve if it weren’t for Docker and Kubernetes, and specifically scaling up and down in support of dynamic workloads. Streams aren’t consistent — they are consistently flowing, but the amount of data hitting the application is always varied. So the infrastructure to support the application needs to have that ability to scale up and down. I would also say that Kubernetes and the management stack it provides — whether OpenShift or bare Kubernetes — provide a critical resilience and scalability factor for the underlying infrastructure.

Read More:   Update VoltDB Adds Geospatial Support, Cross-Site Replication

But for enterprises pushing streaming systems into production, there is also a crash course in the finer-grained concerns of resilience and scalability within the application. Kubernetes is great for orchestrating these boxes of software called containers. But equally important is what you put inside those containers and getting them to work as a single application. These finer grained concerns of data consistency, persistence, latency guarantees and a host of other concerns have given rise to frameworks like Akka that are very complimentary to the Kubernetes, and becoming increasingly common bedfellows in the streaming data stack. In the survey, the top impediment to streaming data adoption is knowledge and complexity — and I believe the sharp learning curve of these finer-grained concerns is where a lot of the knowledge gaps exist.

How Developers Work with Data Will Never Be the Same

Developers used to regard data as a necessary evil and had the luxury of assuming it was ready for consumption and tailored for their applications. Relational data stores were built specifically so that applications could get access in a consistent, common way. None of that is the case any longer with streaming data.

Now we can store virtually unlimited data. Now as a developer we have to get down into the minutia around the opportunities based on that data. It’s not just that the data is moving faster and in greater volumes — it’s that it’s unstructured and isn’t being handed off from a relational database.

The applications are richer, but developers are also being forced to become smarter about data. Before they just asked for the schema of the data and stored procedures. Now they need to know its latency, where it’s coming from, and whether there is any processing that happens before it comes to the application.

Scala Developers are the Kingmakers in this Streaming Data Future

There’s a reason Scala was chosen to build frameworks like Apache Kafka and Apache Spark. That’s because Scala does really well with data due to its collections library, and its functional programming feature set. It offers a much more straightforward way to deal with streaming data than any other language. Enterprises that decide to build their business applications and logic tied to Spark and Kafka get all of the Scala benefits of efficient code, type safety, and assurance of catching complicated bugs before they hit production. As you’re building streaming based applications, if you’re using Scala, you’re getting the benefits of how well the language handles data manipulation.

Read More:   Part 2 – InApps Technology 2022

Check Out Lightbend’s Survey with InApps to Learn More

The challenge is huge for enterprises trying to figure out where to start with streaming data. Which frameworks to use, when there are so many (Akka, Spark, Kafka, Samza, Flink, Gearpump)? How to fast track your understanding of your business problem and how it maps back to how other enterprises have approached the streaming data opportunity (see Lightbend case studies)? I hope you’ll find our Streaming Data 2019 report with InApps helpful in your journey to take advantage of the many exciting use cases for your business around streaming data.

Download the report, “Streaming Data and the Future Tech Stack,” at Lightbend.com.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.