Simon Crosby
Simon Crosby is CTO at Swim, a continuous intelligence software vendor that focuses on edge-based learning for fast data. He co-founded Bromium in 2010 and now serves as a strategic advisor. Previously, he was the CTO of the Data Center and Cloud Division at Citrix Systems; founder, CTO, and vice president of strategy and corporate development at XenSource; and a principal engineer at Intel, as well as a faculty member at Cambridge University. Simon is an equity partner at DCVC, serves on the board of Cambridge in America, and is an investor in and advisor to numerous startups. He’s the author of 35 research papers and patents on a number of data center and networking topics, including security, network and server virtualization, and resource optimization and performance.

Organizations are drowning in streams of data from products, assets, apps, and infrastructure. Big-data “store-then-analyze” architectures struggle to find answers in time because data streams are boundless, but data is only ephemerally useful. High data volumes and geographically distributed sources make centralized architectures challenging and expensive.

Can databases keep up? There are hundreds to choose from. All aim to solve hard infrastructure problems under the hood.  A single API call can trigger a snapshot or transaction roll-back. They offer features for analysis — even machine learning — caching, and clustered and distributed operation. They are resilient to faults, and increasingly secure. What’s not to love? But, perhaps a more useful question is: Can databases help enterprises seeking continuous intelligence from their data?

Read More:   Great for Adoption, But Are They Secure? – InApps Technology 2022

But databases are updated by clients and queried by users. They don’t drive (re-)computation of insights when new data arrives or transform data to a stream of insights. That’s not enough for continuous intelligence applications that need an answer the moment data arrives. Sadly, databases don’t run applications and can’t make sense of data.

An emerging category of software — for continuous intelligence — aims to solve this problem. It delivers insights on-the-fly from streaming data, by using data to drive continuous analysis, learning and prediction.

What Is Continuous Intelligence?

The goal of continuous intelligence is to always have the answer, enabling a real-time response. It is achieved through continuous analysis, learning and prediction — on-the-fly — on streaming data:

  • Data-driven computation: Analysis is driven by arriving data (as opposed to queries or batches) for three reasons: Users need automated reactions in real-time; data streams are boundless, and real-world data is only ephemerally useful.
  • Stateful, contextual analysis: Relationships such as containment, proximity, or even analytical relationships like correlation or predictions are vital for applications that reason about the collective meaning of events. Relationships are fluid and are continuously re-evaluated.
  • Time is fundamental: Since data drives computation, and results are available immediately, the concept of time is “built in” — insights stream continuously to applications, storage, and users, and can be used to drive automation.

Each new event triggers re-evaluation of all dependencies. For example: A smart-city application might alert an inspector to stop any truck with bad braking behavior when it is predicted to be on the same road as the inspector within two minutes. Since there is no point telling an inspector to stop a truck that has already passed, responses must be real-time and situationally relevant: Only an inspector on the same street as and ahead of a flagged truck should be alerted. The application needs to deliver results for all trucks and all inspectors in the city, in real-time.

Read More:   Update Spinning Up a Hadoop Cluster with Apache Ambari and Brooklyn

Fleeting relationships such as that between a truck and inspector, require complex analysis that moment-by-moment uses the positions and velocities of each truck to evaluate “bad braking” and to predict its route and alert an inspector. They cannot be represented in any traditional database. Arguably for such continuously evolving real-world systems the idea of a database as a repository of “truth” is inappropriate anyway because the current state of a source is less useful than its behavior over time:

  • Distributions, trends, or statistics computed from changes over time are often more useful
  • ML, regression, and other predictive tools use past behavior to predict the future
  • Often real-world data includes values that are themselves estimates or time windows
  • Complex relationships between sources can often only be found in the time domain

The behavior of the entire system over time is needed to ascertain its current likely state and predict its likely future states to extract meaning. Continuous intelligence applications deliver a continuous stream of insights and responses that result from the continuous interpretation of the joint effects of all events on a stateful system model over time. As relationships change — geospatially, mathematically, or otherwise, the set of relationships — kept in-memory as a dynamically updated graph that captures meaning — is continuously updated. When new data is received new insights and updated relational links are computed.

Continuous Intelligence imposes two important changes on the data processing architecture:

  1. Algorithms need to be adapted to deal with boundless inputs, for example using sketches and unsupervised learning.
  2. Analysis needs to be stateful and must include (immediate) re-evaluation of dependencies to determine cascading impacts. This in turn leads to an in-memory architecture to avoid the need for long roundtrips to a database.

SwimOS: An OSS Continuous Intelligence Platform

SwimOS is a lightweight, Apache 2.0 licensed, distributed runtime for continuous intelligence applications. Each data source is represented by a stateful, concurrent actor called a Web Agent. A Web Agent is like a concurrent digital twin of a data source that processes its own data, but it can also execute complex business logic, evaluate predicates, and even learn and predict in real-time without database round-trips. It can even react — delivering responses in real-time.

Read More:   Update PostgreSQL Gets a Fix for a Passwordless Authentication Flaw

Web Agents dynamically link to each other based on real-world relationships between the sources they represent, like containment or proximity or even analytical relationships, such as correlation. As Web Agents link to each other they form a fluid in-memory graph of context-rich associations between data-sources. A link allows concurrent, stateful Web Agents to share their in-memory states. Web Agents make and break links as they process events, based on continuously evaluated changes in the real-world. The resulting in-memory graph is a bit like a live “LinkedIn for things”: Web Agents, which are like “intelligent digital twins” of data sources, inter-link to form a graph.

The magic of linking is that it enables each Web Agent to concurrently compute using its own state and the states of other agents to which it is linked, enabling granular contextual analysis, learning and prediction, and an active response. So, the knock-on effects of changes on the part of an entity in the real-world are immediately visible as state changes in its Web Agent — and all related (linked)Web Agents.

Web Agents also act as concurrent materialized views that are continuously re-evaluated. They can link to millions of other Web Agents to derive KPIs or aggregate views. Using the power of links, relationships, analysis and learning in an application services tier of Web Agents allows developers to easily add additional tiers of services to an already active continuous intelligence deployment.