Home
>
Data Science
>
Update Quickstart to Apache Kafka Stream Processing with ksqlDB

March 22, 2022 by Phu Nguyen

Update Quickstart to Apache Kafka Stream Processing with ksqlDB

Main Contents:

Quickstart to Apache Kafka Stream Processing with ksqlDB is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Quickstart to Apache Kafka Stream Processing with ksqlDB in today’s post !

What Is Stream Processing?

Kafka is great for working with events through its durable, append-only log. Taking action on those streams of events in real-time, however, can be a challenge. This is where a stream processor such as ksqlDB comes in. KsqlDB provides a processing model for working with groups of events as if they were in-memory collections and makes them accessible through SQL.

Stream processing deals with data events as they come in and pushes them into pipes known as topics as they go out. The data is immutable, so applications can read from a topic, compute new information, and then push the result into another topic as needed.

What Is a Streaming Database?

Michael Drogalis

Michael is Confluent’s stream processing product lead, where he works on the direction and strategy behind all things compute related.

A streaming database provides a single SQL interface that helps you build stream processing applications instead of interacting with many different subsystems. So, instead of dealing with events and topics, you deal with streams and tables. A stream is a topic but with a strongly defined schema. SQL is used to create these streams, define their schemas, insert, filter and transform data. Meanwhile, ksqlDB takes care of all the underlying managerial work of executing those statements so you can focus instead on developing your application.

Creating a New Stream

As implied, to define new streams, you can use SQL CREATE commands. While the command itself is relatively simple SQL, with a few extensions it ultimately controls the underlying Kafka topics without the user ever touching Kafka directly.

A stream has a schema and a given key that are the important parts of the command:

<br />
CREATE STREAM readings (<br />
    sensor VARCHAR KEY,<br />
    location VARCHAR,<br />
    reading INT<br />
) WITH (<br />
    kafka_topic=”readings”,<br />
    partitions = 3,<br />
    value_format=”json”<br />
);

CREATE STREAM readings (

sensor VARCHAR KEY,

location VARCHAR,

reading INT

) WITH (

kafka_topic = ‘readings’,

partitions = 3,

value_format = ‘json’

);

Whenever a new stream is established, Kafka creates an empty new topic that’s partitioned accordingly. KsqlDB stores metadata for these definitions in its own topic that each ksqlDB server has access to as a global catalog of objects.

Inserting Rows into a Stream

SQL users will instantly recognize that we use standard INSERT statements to add data to the stream. Again, no knowledge of stream management, topics, partitions, etc., is needed for a developer to start adding data into these streams.

Here is some sample data you can load to follow along as we build our example:

<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-1’, ‘wheel’, 45);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-2’, ‘motor’, 41);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-1’, ‘wheel’, 42);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-3’, ‘muffler’, 42);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-3’, ‘muffler’, 40);</p>
<p>INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-4’, ‘motor’, 43);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-6’, ‘muffler’, 43);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-5’, ‘wheel’, 41);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-5’, ‘wheel’, 42);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-4’, ‘motor’, 41);</p>
<p>INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-7’, ‘muffler’, 43);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-8’, ‘wheel’, 40);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-9’, ‘motor’, 40);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-9’, ‘motor’, 44);<br />
INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-7’, ‘muffler’, 41);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-1’, ‘wheel’, 45);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-2’, ‘motor’, 41);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-1’, ‘wheel’, 42);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-3’, ‘muffler’, 42);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-3’, ‘muffler’, 40);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-4’, ‘motor’, 43);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-6’, ‘muffler’, 43);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-5’, ‘wheel’, 41);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-5’, ‘wheel’, 42);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-4’, ‘motor’, 41);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-7’, ‘muffler’, 43);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-8’, ‘wheel’, 40);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-9’, ‘motor’, 40);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-9’, ‘motor’, 44);

INSERT INTO readings (sensor, location, reading) VALUES (‘sensor-7’, ‘muffler’, 41);

The database checks that each record’s schema is valid before creating the record and serializing its content.

While we talk about inserting rows into a stream, remember that ultimately the data is going into a Kafka topic as an event record as well. Applications that already use Kafka can continue to run in parallel to ksqlDB applications.

Transforming an Event Stream

Next, we start to add value to the stream by modifying its content and publishing the outcome as a new stream. No low-level custom consumer/producer code is needed to do a straightforward transform with SQL.

This persistent query transforms one stream (readings, above) into a new one called clean that has modified text to upper case in one field.

<br />
CREATE STREAM clean AS<br />
SELECT sensor,<br />
reading,<br />
UCASE(location) AS location<br />
FROM readings<br />
EMIT CHANGES;

CREATE STREAM clean AS

SELECT sensor,

reading,

UCASE(location) AS location

FROM readings

EMIT CHANGES;

This query creates a mini-application running as a new stream. As one stream receives new rows, the persistent query (pq1) works through each one and writes into another stream. EMIT CHANGES allows the query to keep running and watches for recent events.

Behind the scenes, ksqlDB compiles the query’s physical execution plan as a Kafka Streams topology. Running as a background service, it reacts to new topic records as they arrive. Processing takes place on ksqlDB servers and is horizontally scalable across nodes.

Filtering Rows out of a Stream

Similar to transforming, a SQL WHERE clause can also do filtering. No new application code is needed and the new stream is created and managed accordingly.

<br />
CREATE STREAM high_readings AS<br />
    SELECT sensor, reading, location<br />
    FROM clean<br />
    WHERE reading > 41<br />
    EMIT CHANGES;

CREATE STREAM high_readings AS

SELECT sensor, reading, location

FROM clean

WHERE reading > 41

EMIT CHANGES;

As you can see, we are effortlessly chaining together several streams at this point. KsqlDB mechanics propagate your data changes through the chain.

Sponsor Note

sponsor logo

Confluent, founded by the original creators of Apache Kafka®️, is pioneering a new category of data infrastructure focused on data in motion. With Confluent’s cloud native offering any organization can easily build and scale next-generation apps needed to run their business in real-time.

Combining Stream Operations into One

To simplify what we’ve created, we must get rid of streams we do not need.

<br />
CREATE STREAM high_pri AS<br />
    SELECT sensor,<br />
           reading,<br />
           UCASE(location) AS location<br />
    FROM readings<br />
    WHERE reading > 41<br />
    EMIT CHANGES;

CREATE STREAM high_pri AS

SELECT sensor,

reading,

UCASE(location) AS location

FROM readings

WHERE reading > 41

EMIT CHANGES;

We can bypass the multiple streams by combining operations into a single new stream with more SQL.

Managing Partition Keys

In Kafka, partitioning controls data locality (where it resides in the cluster). The choice of how you key your records is crucial, especially if you use Kafka clients to process your data. We defined our key column (sensor) in our first example, but in ksqlDB, we can change this using a PARTITION BY clause.

<br />
CREATE STREAM by_location AS<br />
    SELECT *<br />
    FROM high_pri<br />
    PARTITION BY location<br />
    EMIT CHANGES;

CREATE STREAM by_location AS

SELECT *

FROM high_pri

PARTITION BY location

EMIT CHANGES;

All rows co-locate when they have the same location value, allowing more advanced stateful operations like streaming joins and incremental aggregations.

This final animation shows the overall workflow, and now you can see all the circles of similar color (same location) end up on the same partition.

Learn More about ksqlDB

We’ve only scratched the surface of how ksqlDB works and that its constructs are concise, composable and elegant. They should allow you to develop new applications and solutions faster than before, without diluting Kafka’s core concepts.

Follow our stream processing blogs to discuss joining, scaling, fault tolerance, and how time works. Each is a fascinating world in its own right. Until then, there’s no substitute for trying ksqlDB yourself.

Featured image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Quickstart to Apache Kafka Stream Processing with ksqlDB

Read more about Quickstart to Apache Kafka Stream Processing with ksqlDB at Wikipedia

What Is Stream Processing?

What Is a Streaming Database?

Creating a New Stream

Inserting Rows into a Stream

Transforming an Event Stream

Filtering Rows out of a Stream

Combining Stream Operations into One

Managing Partition Keys

Learn More about ksqlDB

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about Quickstart to Apache Kafka Stream Processing with ksqlDB at Wikipedia

What Is Stream Processing?

What Is a Streaming Database?

Creating a New Stream

Inserting Rows into a Stream

Transforming an Event Stream

Filtering Rows out of a Stream

Combining Stream Operations into One

Managing Partition Keys

Learn More about ksqlDB

Get a custom Proposal

You need to enter your email to download

Blog post

Locations