What is Data Freshness? Definition, Examples, and Best Practices
If you care about whether your business succeeds or fails, you should care about data freshness. Fresh data is important because it has a huge impact on your bottom line. Unfortunately, that impact often goes undetected—until it’s too late.
Say your business uses data for operational purposes, and your data is stale, you could inadvertently send a discount code to a cohort of customers who already purchased your solution, inviting them to demand the same deal terms, costing you thousands of dollars.
If your business uses data for decision-making purposes, on the other hand, and your data is stale, you could underreport your return on ad spend, causing you to withdraw an investment that is actually paying dividends in reality.
Now that you know why data freshness matters, let’s dive into exactly what it means. In this blog post, you’ll find a definition, examples, and four methods for measuring data freshness.
What is data freshness?
Data freshness, sometimes called data up-to-dateness, is one of ten dimensions of data quality. Data is considered fresh if it describes the real world right now. This data quality dimension is closely related to the timeliness of the data but is compared against the present moment, rather than the time of a task.
What are some examples of stale data?
Imagine that you’re part of the data team, specifically part of the data engineering team, which includes creating data pipelines for downstream stakeholders. Just as an example, we’ll be using a common use case of pulling data from the Google Ads and Google Analytics APIs into your data warehouse as part of your core data sources being used to define marketing attribution. While some use cases may require constant data processing to achieve real-time decisions; if we’re looking for marketing attribution, a common minimum acceptable cadence for data refreshes can be safely set at daily updates - any refresh cadence beyond that would be considered "stale" data.
The above is one way that the definition of “stale data” can change dependent on your internal data management agreements with other stakeholders. Continuing with the example, if your team shifts from ad-hoc SQL queries to adopting dbt to chain together schema and model dependencies in the pursuit of speed, your once acceptable daily updates slip quickly into being considered as “stale data”.
How do you measure data freshness?
To test any data quality dimension, you must measure, track, and assess a relevant data quality metric. In the case of data freshness, you can measure the difference between latest timestamps against the present moment, the difference between a destination and a source system, verification against an expected rate of change, or corroboration against other pieces of data.
{{inline-a}}
How to ensure data freshness
One way to ensure data freshness is through anomaly detection, sometimes called outlier analysis, which helps you to identify unexpected values or events in a data set. Data Observability tools include anomaly detection as part of the core functionality, and can find not only data freshness, but also other dimensions such as completeness or consistency, for you to scale data quality measures across your data warehouse.
Using the example of a stale number of products sold, anomaly detection software would notify you instantly if the frequency at which the table was updated was outside of the normal range. The software knows it’s an outlier because its machine learning model learns from your historical metadata.
Here’s how anomaly detection helps Andrew Mackenzie, Business Intelligence Architect at Appcues, perform his role:
“The important thing is that when things break, I know immediately—and I can usually fix them before any of my stakeholders find out.”
In other words, you can say goodbye to the dreaded WTF message from your stakeholders. In that way, automated, real-time anomaly detection is like a friend who has always got your back.
To take anomaly detection for a spin and put an end to poor data quality, sign up for Metaplane’s free-forever plan or test our most advanced features with a 14-day free trial. Implementation takes under 30 minutes.
Table of contents
Tags
...
...