Home
>
Data Science
>
Update Test Data? Get Real

March 29, 2022 by Phu Nguyen

Update Test Data? Get Real

Main Contents:

Test Data? Get Real is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Test Data? Get Real in today’s post !

Read more about Test Data? Get Real at Wikipedia

You can find content about Test Data? Get Real from the Wikipedia website

Karun Bakshi

Karun Bakshi is vice president of product marketing at Delphix. Bakshi loves imagining, building and talking about software-driven innovation. He has spent his career in software in nearly all associated capacities in engineering, product management, evangelism, partnerships, business development, and product marketing at various companies including Lockheed Martin, Oracle, Microsoft, Pivotal and Delphix. Whether it’s discussing corner cases of algorithms or go to market strategy, he’s game if you are.

It’s a story we’ve seen time and again. Software tends to fail when it does not accurately account for reality. We saw it nearly two decades ago with the Y2K scare and we saw it earlier this year when the New York Stock Exchange had to suspend trading on stocks using four digits. These are tales of data-related defects: when software systems break down due to unanticipated, incoming data exercising the software in unexpected ways. Such seemingly small defects are often incredibly costly and surprisingly common. In fact, I’m sure most organizations have dealt with a data-related defect fallout in some form or another.

Whereas the earlier examples are well known for their failure to model (future) reality well, a much more common and mundane scenario these days is the failure to robustly manage the dynamic, complexity of data states that can exist in a software system over time. That new customer field you added to your application was probably well tested in isolation. But, have you fully tested how it interacts with previous modules you wrote, or subsystems developed by other teams in all their dynamic, richness of real life? Chances are, no.

Collectively, we’ve gotten into a bad habit of using synthetic data which, by definition, is unrealistic, resulting in a lot of poorly built, fragile applications that don’t accurately reflect reality. A simple oversight and it can cost companies substantial money, time, credibility, opportunity and users to recover from it.

Synthetic data works well if you don’t have access to real data (e.g. prior to initial launch or an installed app) or the systems are simple enough that its various states are easily understood and handled. Today, however, most apps are SaaS, and even simple apps are quite complex because they interact with multiple backend systems. So, production data — a more accurate reflection of data — is readily available. Nevertheless, many of us feel compelled to keep using unrealistic, synthetic data. For most of us, it seems there is little choice. Using unrealistic data is often a necessity due to time, security, and technical constraints.

How Did We Get Here? Synthetic Data’s Slippery Slope

With the world awash in the race to Digital Transformation, time to market, and consequently, agile development and speed to market have become paramount. Starting with synthetic data for testing makes sense when you build a new app or a new feature. But as the app becomes more complex, our testing approach remains stagnant. It’s easy to build simple test cases with synthetic data. Relevant test data for more complex tests is time-consuming and painstaking to construct synthetically. One of the most common reasons we’ve come to rely on unrealistic data is the constant need to make up time in the development cycle.

The other piece of the puzzle is security and data privacy. If you’re building something that requires sensitive information as part of the development cycle, using real user data can be incredibly powerful when it comes to modeling customers’ needs and behaviors. But, we can’t of course, in today’s age of data breaches, sacrifice consumer privacy to leverage it.

Facebook’s Cambridge Analytica scandal is a stark reminder that data privacy is of paramount importance and can result in significant implications. So, directly visible personally identifiable information (PII), protected health information (PHI) or other sensitive information cannot be part of the standard testing modus operandi. And so, we settle for synthetic data as a proxy to sensitive user information.

If we want to build dependable, scalable, high-performance applications, the days of cutting corners and faking data are over. Modern development requires realistic test data to be delivered with speed and security. I know what you’re thinking: that’s easier said than done. But it is possible to deliver speed and maintain data privacy with realistic data today, and it’s the way the future is moving.

So, What’s the Alternative? A Reality Check

The alternative to synthetic data is real data — production data. Many organizations turn to copies of production data. However, creating a copy of production data can be a frustratingly slow process, when faced with the prospect of the IT ticket with a three-week backlog. Creating an obfuscated copy of production data that hides sensitive data while preserving business value (e.g. a social security number still has nine digits, referential integrity is maintained, etc.), can be similarly time-consuming and additive. Moreover, most organizations do these activities in manual, ad hoc ways fraught with errors and delays, and limiting their ability to do this frequently and consistently.

Rising to the challenge, the DataOps movement has emerged to bring discipline and efficiency to the flow of data in the modern enterprise. A cultural movement bridging the needs of data consumers and data managers as much as a technology play, a DataOps practice and platform can bring speed, security consistency and automation to the provisioning of (test) data across the enterprise.

A mature DataOps approach will comprise several key elements. It should seamlessly integrate and scale with the heterogeneous enterprise IT landscape and work with all relevant data wherever it exists (SQL/NoSQL, cloud/on-prem, etc.). It should facilitate data capture, processing and delivery in a form that consumers can use on a day-to-day basis with minimal overhead or delay. And, finally, it must proactively identify and mitigate risk as data flows across the enterprise.

With DataOps, production data delivery can be automated to accelerate application development and testing, delivering both speed and security. Without that, teams are left using stale or high-risk datasets or waiting on provisions and refreshes.

Through secure, self-service and automated access to data, devs can accelerate their workflows to receive data when and where they need it for dependable, scalable, and high-performance applications. Integrated with modern CI/CD pipelines, DataOps automation can remove one of the few remaining sources of friction in software delivery: data environment provisioning.

Data-related defects are far too common and costly in the modern enterprise. It’s time to stop cutting corners and leave “fake data” behind for good. It will take some work to get there, but it is possible to achieve speed and security when accessing realistic production data. When you do, data flows easily and securely, and great things happen.

Feature image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Test Data? Get Real

Read more about Test Data? Get Real at Wikipedia

How Did We Get Here? Synthetic Data’s Slippery Slope

So, What’s the Alternative? A Reality Check

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about Test Data? Get Real at Wikipedia

How Did We Get Here? Synthetic Data’s Slippery Slope

So, What’s the Alternative? A Reality Check

Get a custom Proposal

You need to enter your email to download

Blog post

Locations