Home
>
Data Science
>
Update Data and Decisions: What Is Your Data Really Telling You?

March 22, 2022 by Phu Nguyen

Update Data and Decisions: What Is Your Data Really Telling You?

Main Contents:

Data and Decisions: What Is Your Data Really Telling You? is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Data and Decisions: What Is Your Data Really Telling You? in today’s post !

AI and Analytics: What Is the Data Telling You?

When you mine data for insights or to automate a decision, whether, through machine learning or data analytics, it matters to be careful how you design each question. You need to know how data was collected and to keep asking yourself, “Does the data represent what I think it does?”

Take this real-world example from machine learning: A data scientist was tasked with building a recommendation system for an online video streaming service. This data scientist was experienced with developing recommenders and knew to look at what people do rather than what they say they like (behavior over reported ratings) to discover preferences. In this case, the data to be used for training the recommender was the videos people clicked on — this was the behavior used to reveal preferences. Surprisingly, however, the results were poor; the recommendation system did not perform well although it used approaches that had been successful in the past.

The solution was to re-examine a broader group of viewer behavior data through direct inspection and re-think the assumptions about what the data represented. It turned out, using video titles as the indicator of preference wasn’t a good idea. In many cases, people selected a title but quickly clicked away, often because the title did not match the content, either through error or spamming. But using a different target for training — watching the first 30 seconds of a video rather than just clicking on it — resulted in a video recommendation system that worked beautifully.

The lesson here is not about video but about the importance of keeping an open mind to what your data tells you, trying different approaches and continually questioning your assumptions. It’s also an example of what newly trained data scientists discover: Data in the real world is not as clean and straightforward compared to the carefully selected data sets often used in machine learning classes.

Clearly, potential pitfalls exist in data selection and in framing the question you are addressing. So, what can you do about that?

Sponsor Note

sponsor logo

HPE Ezmeral advances digital transformation initiatives by shifting time and resources from IT operations to innovations. Modernize and secure your apps. Simplify your Ops. And harness data to go from insights to impact.

Avoiding the Pitfalls: Tips for Better Data Science

No specific set of steps is guaranteed to avoid these problems. Much of the ability to avoid pitfalls comes through experience and being generally suspicious about your own assumptions. Just being alert to the potential for data to be misleading is already a step in the right direction. And a number of practices can help you better develop your skills and instincts on how to approach these issues. In addition to working on a system with efficient data management and data engineering, keep in mind these tips about data and decisions:

Plan time for data exploration, and talk to domain experts to find out more about how data was collected, known defects, what the labels mean, what other related data may be available or could be collected.
Look at the issue in more than one way. If different types of data lead you to the same conclusions, your confidence level should increase. Similarly, try predicting some variables based on others. This helps you understand if the data is self-consistent.
Ask yourself, or others who have tried similar approaches, if the results are roughly what you expect. A model that behaves much better or much worse than expected should be a warning flag to go back and re-examine data as well as how the question is framed. It isn’t always the case that outlier results are bogus — you might have built an extraordinary system! But it is a good idea to recheck the process if models behave in particularly surprising ways.
Consider injecting synthetic data as a test of your system. Physicists working on particle accelerators and large-scale astronomical studies do something similar. They inject sample signals or known kinds of noise into their data to verify their analysis methods can robustly detect the injected samples.
Try randomizing a data source you use for training. If this doesn’t change your results, then modeling is not working the way you think it is.
If possible, shadow real users as they go about the behaviors of interest. Now that you know what they actually do, verify their actions are reflected in the data you plan to use. This is a great way to reveal faulty assumptions or misleading aspects of data collection.

A Final Data Science Example

A few years ago, I was at a conference, signing and giving away books with my co-author, Ted Dunning. We gave people a choice between: “Practical Machine Learning: Innovations in Recommendation” and “A New Look at Anomaly Detection,” but each person could take only one book. We were surprised that over 80% chose the book on recommendations over the one on anomaly detection. Then a thought occurred to me. I leaned over and whispered “dog food” to Ted. He swapped the positions of the books.

Turns out, data scientists prefer the book on the left.

If you’d like to read our latest short book, download a free PDF courtesy of HPE: AI and Analytics at Scale: Lessons from Real World Production Systems.

InApps Technology is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Lead image via Shutterstock.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

May 29, 2025 by Anh Hoang

Update Data and Decisions: What Is Your Data Really Telling You?

Read more about Data and Decisions: What Is Your Data Really Telling You? at Wikipedia

AI and Analytics: What Is the Data Telling You?

Avoiding the Pitfalls: Tips for Better Data Science

A Final Data Science Example

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

AI Automation for Business in 2025: A Step-by-Step Guide

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

Locations

Read more about Data and Decisions: What Is Your Data Really Telling You? at Wikipedia

AI and Analytics: What Is the Data Telling You?

Avoiding the Pitfalls: Tips for Better Data Science

A Final Data Science Example

Get a custom Proposal

You need to enter your email to download

Blog post

Locations