Home
>
Software Development
>
Find the Right Metric for a Prediction Model – InApps Technology 2022

March 30, 2022 by Phu Nguyen

Find the Right Metric for a Prediction Model – InApps Technology 2022

Main Contents:

Find the Right Metric for a Prediction Model – InApps Technology is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Find the Right Metric for a Prediction Model – InApps Technology in today’s post !

Why Are Numeric Scoring Metrics Needed?

Maarit Widmann

Maarit Widmann is a data scientist at KNIME. She started with quantitative sociology and holds her bachelor’s degree in social sciences. The University of Konstanz made her drop the “social” as a Master of Science. Her ambition is to communicate the concepts behind data science to others in videos and blog posts. Follow Maarit on LinkedIn. For more information on KNIME, please visit www.knime.com and the KNIME blog.

These models have many consequences in the real world, from the decisions of the portfolio managers to the pricing of electricity at different times of the day, week and year. Numeric scoring metrics are needed in order to:

Select the most accurate model
Estimate the real-world impact of the error of the model

In this article, we will describe five real-world use cases of numeric prediction models, and in each use case, we measure the prediction accuracy from a slightly different point of view. In one case, we measure if a model has a systematic bias, and in another, we measure a model’s explanation power. The article concludes with a review of the numeric scoring metrics, showing the formulas to calculate them, and a summary of their properties. We’ll also link to a few example implementations of building and evaluating a prediction model in KNIME Analytics Platform.

Five Metrics: Five Different Perspectives on Prediction Accuracy

(Root) Mean Squared Error, (R)MSE – Which model best captures the rapid changes in the volatile stock market?

In Figure 1, below, you see the development of the LinkedIn closing price from 2011 to 2016. Within the time period, the behavior includes sudden peaks, sudden lows, longer periods of increasing and decreasing value, and a few stable periods. Forecasting this kind of volatile behavior is challenging, especially in the long term. However, for the stakeholders of LinkedIn, it’s valuable. Therefore, we prefer a forecasting model that captures the sudden changes to a model that performs well on average over the period of five years.

We select the model with the lowest (root) mean squared error because this metric weights big errors more compared to small errors and favors a model that can react to short-term changes and save the stakeholders’ money.

Figure 1. LinkedIn daily stock market closing price from 2011 to 2016: data with few regular patterns and many sudden changes with low forecastability. We select the forecasting model with the lowest (root) mean squared error because it weights the big forecast errors more and favors a model that can capture the sudden peaks and lows.

Mean Absolute Error, MAE – Which model best estimates the energy consumption in the long term?

In Figure 2, you can see the hourly energy consumption values in July 2009 in Dublin, collected from a cluster of households and industries. The energy consumption shows a relatively regular pattern, with higher values during working hours and on weekdays and lower values at night and during weekends. This kind of a regular behavior can be forecasted relatively accurately, allowing for long-term planning of the energy supply. Therefore, we select a forecasting model with the lowest mean absolute error. We do this because it weights big and small errors equally, is therefore robust to outliers, and shows which model has the highest forecast accuracy over the whole time period.

Figure 2. Hourly energy consumption values in June 2009 in Dublin, collected from a cluster of households and industries. The data shows a relatively regular behavior and can therefore be forecasted in the long term. We select the forecasting model with the lowest mean absolute error because this metric is robust to outliers.

Mean Absolute Percentage Error, MAPE – Are the sales forecasting models for different products equally accurate?

On a hot summer day, the supply of both sparkling water and ice cream should be guaranteed! We want to check if the two forecasting models that predict the sales of these two products are equally accurate.

Both models generate forecasts in the same unit, the number of sold items, but at a different scale since sparkling water is sold in much larger volumes than ice cream. In this kind of a case, we need a relative error metric and use mean absolute percentage error, which reports the error relative to the actual value. In Figure 3, in the line plot on the left, you see the sales of sparkling water (purple line) and the sales of ice cream (green line) in June 2020 as well as the predicted sales of both products (red lines). The prediction line seems to deviate slightly more for sparkling water than for ice cream. However, the larger actual values of sparkling water bias the visible comparison. Actually, the forecasting model performs better for sparkling water than for ice cream, as reported by the MAPE values 0.191 for sparkling water and 0.369 for ice cream.

Notice, though, that MAPE values can be biased when the actual values are close to zero. For example, the sales of ice cream are relatively low during the winter months compared to summer months, whereas sales of milk remain pretty constant through the entire year. When we compare the accuracies of the forecasting models for milk vs. ice cream by their MAPE values, the small values in the ice cream sales make the forecasting model for ice cream look unreasonably bad compared to the forecasting model for milk.

In Figure 3, in the line plot in the middle, you see the sales of milk (blue line) and ice cream (green line) and the predicted sales of both products (red lines). If we take a look at the MAPE values, the forecasting accuracy is apparently much better for milk (MAPE = 0.016) than for ice cream (0.266). However, this huge difference is due to the low values of ice cream sales in the winter months. The line plot on the right in Figure 3 shows exactly the same actual and predicted sales of ice cream and milk, with ice cream sales scaled up by 25 items for each month. Without the bias from the values close to zero, the forecasting accuracies for ice cream (MAPE=0.036) and milk (MAPE=0.016) are now much closer to each other.

Figure 3. Three line plots showing actual and predicted values of ice cream and sparkling water (line plot on the left) and ice cream and milk (line plots in the middle and on the right). In the line plot on the right, the ice cream sales values are scaled up by 25 in order to avoid the bias in mean absolute percentage error introduced by small actual values.

Mean Signed Difference – Does a running app provide unrealistic expectations?

A smartwatch can be connected to a running application which then estimates the finishing time in a 10k run. It could be that, as a motivator, the app estimates the time lower than what’s realistically expected.

To test this, we collect the estimated and realized finishing times from a group of runners for six months and plot the average values in the line plot in Figure 4. As you can see, during the six months, the realized finishing time (orange line) decreases more slowly than the estimated finishing time (red line). We confirm the systematic bias in the estimates by calculating the mean signed difference between the actual and estimated finishing times. It’s negative (-2.191), so the app indeed raises unrealistic expectations! Notice, though, that this metric is not informative about the magnitude of the error because if there’s a runner who actually runs faster than the expected time, this positive error compensates a part of the negative error.

Figure 4. Estimated (red line) and realized (orange line) finishing times in a 10k run in the period of six months. The estimated times are biased downwards, also shown by the negative value of mean signed difference.

R-squared – How much of our years of education can be explained through access to literature?

In Figure 5, you can see the relationship between the access to literature (x-axis) and years of education (y-axis) in a sample of the population. A linear regression line is fitted to the data to model the relationship between these two variables. To measure the fit of the linear regression model, we use R-squared.

R-squared tells how much of the variance of the target column (years of education) the model explains. Based on the R-squared value of the model, 0.76, the access to literature explains 76% of the variance in the years of education.

Figure 5. Linear regression line modeling the relationship between access to literature and years of education. R-squared is used to measure the model fit, i.e., how much of the variance in the target column (years of education) can be explained by the model, 76% in this case.

A Review of the Five Numeric Scoring Metrics

The numeric scoring metrics introduced above are shown in Figure 6. The metrics are listed along with the formulas used to calculate them and a few key properties of each. In the formulas, y_i is the actual value and f(x_i) is the predicted value.

Figure 6. Common numeric scoring metrics, their formulas, and key properties. In the formulas, y_i is the actual value, f(x_i) is the forecasted value, and n is the sample size.

Summary

In this article, we’ve introduced the most commonly used error metrics and the perspectives that they provide to the model’s performance.

It’s often recommended to take a look at multiple numeric scoring metrics to gain a comprehensive view of the model’s performance. For example, by reviewing the mean signed difference, you can see if your model has a systematic bias, whereas by studying the (root) mean squared error, you can see which model best captures the sudden fluctuations. Visualizations, a line plot, for example, complement the model evaluation.

For a practical implementation, take a look at the example workflows built in the visual data science tool KNIME Analytics Platform.

Download and inspect these free workflows from the KNIME Hub:

Feature image via Pixabay.

Source: InApps.net

List of Keywords users find our article on Google:

knime

“knime”

hcmc stock forecast

knime jobs

knime price

phunware price prediction

knime analytics platform

data scientist summary linkedin

phunware stock forecast

knime analytics

mse energy comparison

phunware stock

“anova apps”

hcmc stock

knime com

blue energy trustpilot

bias meter education software

www.xaxis.com linkedin

wikipedia cream

linear regression wikipedia

knime analytics platform pricing

knime analytics platform review

orange line 4

ice lines sale

what percentage of wikipedia is accurate

xaxis jobs

absolute software vietnam

wawa whole milk

indeed modelling jobs

wawa review

wikipedia linear regression

data analytics app development orange county

numeric technologies

predictive customer analytics management consultant water

hcmc price prediction

hcmc share price forecast

hcmc stock prediction

numeric icons

root industries type r review

“predictive protection”

hcmc stock price prediction

mape in r

plot squared

software development outsourcing dublin

predictive customer analytics business consultant water

predictive customer analytics consultancy water

save me konstanz

who makes great value ice cream

linear programming wikipedia

food science jobs dublin

software development outsourcing companies dublin

control software finishing lines

indeed models

knime platform

react native app developers dublin

aws consultant hourly rate

numetric reviews

business forecasting wikipedia

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.