Home
>
Data Science
>
Update How MemSQL Enables Exactly-Once Semantics with Apache Kafka

March 29, 2022 by Anh Hoang

Update How MemSQL Enables Exactly-Once Semantics with Apache Kafka

Main Contents:

How MemSQL Enables Exactly-Once Semantics with Apache Kafka is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn How MemSQL Enables Exactly-Once Semantics with Apache Kafka in today’s post !

Key Summary

Overview: The article by InApps Technology explains how MemSQL (rebranded as SingleStore), a distributed, real-time relational database, integrates with Apache Kafka to achieve exactly-once semantics for data processing, ensuring reliable, consistent data delivery in high-throughput, distributed systems.
What is MemSQL (SingleStore)?:
- Definition: A distributed, in-memory, and disk-based relational database optimized for real-time analytics and transactional workloads.
- Key Features:
  - Supports SQL for querying structured data.
  - Scales horizontally across nodes for high performance.
  - Combines transactional (OLTP) and analytical (OLAP) capabilities.
  - Integrates seamlessly with streaming platforms like Apache Kafka.
What is Apache Kafka?:
- Definition: An open-source, distributed streaming platform for handling high-volume, real-time data streams.
- Key Features:
  - Publishes and subscribes to data streams via topics.
  - Ensures fault tolerance and scalability with partitioned logs.
  - Supports stream processing with Kafka Streams and Connect APIs.
Exactly-Once Semantics (EOS):
- Definition: A data processing guarantee ensuring each message in a stream is processed exactly once, avoiding duplicates or data loss.
- Challenges:
  - At-Least-Once: May process messages multiple times (duplicates).
  - At-Most-Once: May skip messages (data loss).
  - Exactly-Once: Requires complex coordination to ensure reliability in distributed systems.
- Importance: Critical for applications requiring data accuracy (e.g., financial transactions, inventory management).
How MemSQL Enables Exactly-Once Semantics with Kafka:
- Integration Mechanism:
  - MemSQL leverages Kafka Connect, a framework for integrating Kafka with external systems, to ingest data streams.
  - Uses MemSQL Pipelines, a native feature for streaming data ingestion, to process Kafka messages.
- Key Components:
  - Kafka Transactions: Kafka’s transactional API (introduced in version 0.11) ensures atomic writes across partitions, enabling EOS.
  - MemSQL Pipelines: Processes Kafka messages with idempotent operations, ensuring no duplicates.
  - Offset Management: MemSQL tracks Kafka offsets within its database, ensuring consistent message processing even during failures.
- Process:
  - Data Ingestion:
    - Kafka Connect pulls messages from Kafka topics into MemSQL Pipelines.
    - Messages are processed in batches, with offsets stored in MemSQL.
  - Transactional Processing:
    - MemSQL wraps Kafka message processing in database transactions.
    - Idempotent operations ensure reprocessed messages don’t create duplicates.
  - Offset Commit:
    - After successful processing, MemSQL commits Kafka offsets transactionally.
    - If a failure occurs, uncommitted offsets allow safe retries without data loss or duplication.
  - Exactly-Once Guarantee:
    - Combines Kafka’s transactional API with MemSQL’s ACID-compliant transactions.
    - Ensures each message is processed and committed exactly once, even in distributed, failure-prone environments.
- Example Workflow:
  - A financial app streams transaction data via Kafka.
  - MemSQL ingests messages, updates account balances, and commits offsets in a single transaction.
  - If a node fails, MemSQL retries from the last committed offset, avoiding duplicate transactions.
Benefits:
- Reliability: Exactly-once semantics ensure data consistency for critical applications.
- Performance: MemSQL’s in-memory processing and Kafka’s scalability handle high-throughput streams.
- Cost Efficiency:
  - Optimized resource usage reduces infrastructure costs.
  - Offshore development in Vietnam ($20-$50/hour via InApps Technology) for SingleStore-Kafka integration saves 20-40% compared to U.S./EU rates ($80-$150/hour).
- Simplified Architecture: Combines streaming and relational data processing in one platform.
- Scalability: Supports large-scale, real-time workloads across distributed clusters.
Challenges:
- Complexity: Configuring Kafka transactions and MemSQL Pipelines requires expertise.
- Resource Usage: In-memory processing may increase memory demands for large datasets.
- Latency: Transactional commits add slight overhead compared to at-least-once processing.
- Version Dependency: Requires compatible Kafka (0.11+) and SingleStore versions for EOS.
Security Considerations:
- Encryption: Enable TLS for Kafka and SingleStore to secure data in transit.
- Access Control: Use Kafka’s ACLs and SingleStore’s RBAC to restrict topic and database access.
- Monitoring: Implement logging and metrics (e.g., via Prometheus) to detect processing anomalies.
Use Cases:
- Financial Services: Processing real-time transactions with guaranteed accuracy (e.g., payment systems).
- E-commerce: Managing inventory updates from high-volume order streams.
- IoT: Analyzing sensor data streams for predictive maintenance with no data loss.
- Real-Time Analytics: Building dashboards with up-to-date metrics from Kafka streams.
InApps Technology’s Role:
- Offers expertise in SingleStore and Kafka integration, delivering reliable, real-time data solutions.
- Leverages Vietnam’s 200,000+ IT professionals, providing cost-effective rates ($20-$50/hour) for high-quality development.
- Supports Agile workflows with tools like Jira, Slack, and Zoom for transparent collaboration (GMT+7).
Recommendations:
- Use SingleStore Pipelines with Kafka Connect for seamless, exactly-once data ingestion.
- Enable Kafka transactions and SingleStore’s ACID compliance for reliable processing.
- Monitor performance with tools like Prometheus to optimize throughput and latency.
- Partner with InApps Technology for expert SingleStore-Kafka solutions, leveraging Vietnam’s skilled developers for cost-effective, high-performance data pipelines.

How MemSQL Works with Kafka

At MemSQL, we make fast, scalable, relational database software, with SQL support. MemSQL works in containers, virtual machines, and in multiple clouds — anywhere you can run Linux.

If you aren’t familiar, this relatively novel combination of attributes — the scalability formerly available only with NoSQL, along with the power, compatibility, and usability of a relational, SQL database — makes MemSQL a leading light in the NewSQL movement, along with Amazon Aurora, Google Spanner, and others. The ability to combine scalable performance, ACID guarantees, and SQL access to data is relevant anywhere that people want to store, update, and analyze data, from a venerable on-premise transactional database to ephemeral workloads running in a microservices architecture.

Of course, we think NewSQL is important. NewSQL allows database users to combine the main benefit of NoSQL — scalability across industry-standard servers — and the many benefits of traditional relational databases, which can be summarized as schema (structure) and SQL support.

In our role as NewSQL stalwarts, Apache Kafka is one of our favorite things. One of the main reasons is that Kafka, like MemSQL, supports exactly-once semantics. In fact, Kafka is somewhat famous for this, as shown in my favorite headline from InApps: Apache Kafka 1.0 Released Exactly Once.

What Is Exactly-Once?

To briefly describe exactly-once, it’s one of three alternatives for processing a stream event — or a database update:

At-most-once. This is the “fire and forget” of event processing. The initiator puts an event on the wire, or sends an update to a database, and doesn’t check whether it’s received or not. Some lower-value Internet of Things streams work this way, because updates are so voluminous, or may be of a type that won’t be missed much. (Though you’ll want an alert if updates stop completely.)
At-least-once. This is checking whether an event landed, but not making sure that it hasn’t landed multiple times. The initiator sends an event, waits for an acknowledgment, and resends if none is received, repeating until it gets an acknowledgment. However, the initiator doesn’t bother to check whether one or more of the non-acknowledged event(s) got processed, along with the final, acknowledged one that terminated the send attempts. (Think of adding the same record to a database multiple times; in some cases, this will cause problems, and in others, it won’t.)
Exactly-once. This is checking whether an event landed, and freezing and rolling back the system if it doesn’t. Then, it will resend and repeat until the event is accepted and acknowledged. If an event doesn’t make it, all the operators on the stream stop and roll back to a “known good” state. Then, processing is restarted. This cycle is repeated until the errant event is processed successfully.

MemSQL Pipelines provide exactly-once semantics when connected to the right message broker.

How MemSQL Joins In with Pipelines

The availability of exactly-once semantics in Kafka gives an opportunity to other participants in the processing of streaming data, such as database makers. MemSQL saw this early. The MemSQL Pipelines capability was first launched in the fall of 2016, as part of MemSQL 5.5; you can see a video here. (There’s much more about the Pipelines feature in our documentation — original and updated. We also have specific documentation on connecting a Pipeline to Kafka.)

The Pipelines feature basically hotwires the well-known ETL (Extract, Transform, and Load) process by connecting to a data source, handling some limited changes to data as it streams in, and loading it into the MemSQL database.

From the beginning, Pipelines have supported exactly-once semantics. When you connect a message broker with exactly-once semantics, such as Kafka, to MemSQL Pipelines, we support exactly-once semantics on database operations.

The key feature of Pipelines is that it’s fast. That’s vital to exactly-once semantics, as it comprises a promise to back up and try again whenever an operation fails.

Like most things worth having in life, exactly-once semantics places certain demands on those who wish to benefit from them. Making the exactly-once promise make sense requires two things:

Having few operations fail.
Running each operation so fast that retries, when needed, are not too extensive or time-consuming.

If these two conditions are both met, you get the benefits of exactly-once semantics without a lot of performance overhead, even when crashes occur. If either of these conditions is not met, the costs can start to outweigh the benefits.

MemSQL 5.5 met these challenges, and the Pipelines capability is popular with our customers. But to help people get the most out of it, we needed to widen the pipe. However, note the “limited” word above — the Pipeline “handles some limited changes to data.” For Pipelines to really replace the whole ETL process, we needed to, well, widen the pipe.

So, in the recent MemSQL 6.5 release, we announced Pipelines to stored procedures. This feature does what it says on the tin: you can write SQL code and attach it to MemSQL Pipelines. Adding custom code greatly extends the transformation capability of Pipelines.

Stored procedures can both query MemSQL tables and insert into them, which means the feature is quite powerful. However, in order to meet the desiderata for exactly-once semantics, there are limitations on it. Stored procedures are MemSQL-specific; third-party libraries are not supported, and developers have to be thoughtful as to overall system throughput when using stored procedures.

Because MemSQL is SQL-compliant, stored procedures are written in standard ANSI SQL. And because MemSQL is very fast, developers can fit a lot of functionality into them, without disrupting exactly-once semantics.

Fast and Flexible

The Pipelines capability is not only fast; it’s also flexible, both on its own and when used with other tools. That’s because more and more data processing components can support exactly-once semantics.

For instance, here are two ways to enrich a stream with outside data. The first is to create a stored procedure to do the work in MemSQL.

The following stored procedure uses an existing MemSQL table to join an incoming IP address batch with existing geospatial data about its location:

<br /><br />
CREATE PROCEDURE proc(batch query(ip varchar, …))</p><br />
<p>AS</p><br />
<p>BEGIN</p><br />
<p>    INSERT INTO t</p><br />
<p>      SELECT batch.*, ip_to_point_table.geopoint</p><br />
<p>      FROM batch</p><br />
<p>      JOIN ip_to_point_table</p><br />
<p>      ON ip_prefix(ip) = ip_to_point_table.ip;</p><br />
<p>END

CREATE PROCEDURE proc(batch query(ip varchar, ...))

BEGIN

INSERT INTO t

SELECT batch.*, ip_to_point_table.geopoint

FROM batch

JOIN ip_to_point_table

ON ip_prefix(ip) = ip_to_point_table.ip;

END

(For a lot more on what you can do with stored procedures, see — wait for it — our documentation, which also describes how to add SSL and Kerberos to a Kafka pipeline.)

You can also handle the transformation with Apache Spark, and you can do it in such a way as to support exactly-once semantics, as described in this article. (As the author, Ji Zhang, puts it very well: “But surely knowing how to achieve exactly-once is a good chance of learning, and it’s a great fun.”)

Once Apache Spark has done its work, stream the results right on into MemSQL via Pipelines. (Which were not available when we first described using Kafka, Spark, and MemSQL to power a model city.)

Use Kafka, Spark, MemSQL Pipelines, and stored procedures for operational flexibility with exactly-once semantics (MemSQL).

Try it Yourself

You can try all of this yourself, quickly and easily. MemSQL software is now available for free, with community support, up to a fairly powerful cluster. This allows you to develop, experiment, test, and even deploy for free. When you need more power, or when you want dedicated support — or if you want to discuss a specific use case — you can contact MemSQL.

Feature image via Pixabay.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: SingleStore.

Source: InApps.net

Rate this post

Anh Hoang

Anh Hoang is Head of SEO Optimization at InApps Technology, ensuring that the message and research of InApps Technology reach the most people possible while adhering to our strict journalistic standards of excellence and integrity.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

April 10, 2026 by Anh Hoang

Update How MemSQL Enables Exactly-Once Semantics with Apache Kafka

Key Summary

Read more about How MemSQL Enables Exactly-Once Semantics with Apache Kafka at Wikipedia

How MemSQL Works with Kafka

What Is Exactly-Once?

How MemSQL Joins In with Pipelines

Fast and Flexible

Try it Yourself

Best Angular Projects for Beginners in 2026

Is It Too Late to Switch Into Tech? What Reddit Career Changers Say

Are Developers Becoming Too Dependent on AI Tools?

Is Being a Self-Taught Developer Still Viable in 2026?

Imposter Syndrome in Tech: Why So Many Developers Feel Like Frauds

Too Many Tools, Too Little Time: How Developers Deal With Stack Fatigue

Why AI Productivity Is Making Developers Feel More Stressed, Not Faster

How to Stay Relevant in Tech Without Learning Everything

Why So Many Developers Feel Burned Out (And What Actually Helps)

Hire Software Engineers in Vietnam: The 2026 Cost & Compliance Guide for Australian CTO

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

Hire Offshore Angular Developers: The Right Development Team In Vietnam

What Is ODC (Offshore Development Center)? Understand Offshore Development Center In 3 Seconds

Hire Full-Stack Developers From Software Outsourcing Companies in 2026

Locations

Key Summary

Read more about How MemSQL Enables Exactly-Once Semantics with Apache Kafka at Wikipedia

How MemSQL Works with Kafka

What Is Exactly-Once?

How MemSQL Joins In with Pipelines

Fast and Flexible

Try it Yourself

Get a custom Proposal

You need to enter your email to download

Blog post

Locations