Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

How to build a live time series anomaly detection model

Need to know when something’s off? Live time series models can detect anomalies for any metric in your data stack. Here’s how to build one yourself.

and
August 23, 2024

Founding Data Scientist @ Metaplane

August 23, 2024
How to build a live time series anomaly detection model

How do you detect when a dbt job is lagging?

How do you get early warnings of problems in ETL pipelines? 

How do you answer questions like this?

Sure, you can track dozens of metrics to try and make sense of what’s happening in your data stack, but once you have them… what do you actually do with them?

Blindly tracking metrics is easy. Thoughtfully using them is surprisingly hard: you need to constantly review them and decide whether or not they’re healthy using contextual knowledge about your stack. Doing this by hand for hundreds of metrics is time-consuming—if not impossible—and a waste of effort if those metrics aren’t doing anything interesting. 

That’s why we need tools to flag us when our once-consistent metrics start looking inconsistent.

Be the first to know, every time

The data team needs to know as soon as any metric looks irregular. Like, if a dbt job that usually takes 20 minutes now takes over an hour, or a regular Fivetran table sync that usually adds 1000 rows now only adds 50.

It might be just a weird fluke, but it might be a signal that something’s broken. If you know immediately when something looks not-quite-right, you can:

  1. Figure out if there’s a problem at all
  2. If so, fix the problem before it has major downstream effects

Live time series anomaly detection allows you to do this for any metric in your data stack. But what is it exactly? Let’s break it down:

  • Live (aka real-time or online): Using a statistical model on the most recent data (e.g., evaluating the latest dbt job run time).
  • Time series: A dataset that is indexed by time or a statistical model for such data (e.g., a dbt job’s run times over the past month).
  • Anomaly detection: Flagging some data in a dataset as being unlike the rest (e.g., flagging an unusually long run time for one dbt job run).

Why time series?

Anomaly detection methods that use time-sensitive information perform much better than models that don’t—especially when you’re observing metrics that change over time (like your data warehousing cases that contain trends and cycles).

Trends change over time

Let’s say we track run times for a dbt job, and we ignore the time index. They have a distribution like this, with run times ranging from 10 minutes to 28 minutes, but most taking around 20:

Nothing out of the ordinary with the cross-sectional distribution for dbt job runs, right?

If I told you that the last dbt job run took 25 minutes, would you consider that anomalous? Probably not—it falls neatly into the above distribution. But let’s look at the same data using the time index:

See how those same dbt job run times look a bit different when you add a timestamp?

While a 25-minute dbt run might not have been anomalous in the past, you can see at the right edge of the chart that the run only takes about 15 minutes now. The run time of this job has been decreasing as the computing power assigned to the job has increased. So, if a new run did take 25 minutes, you might think, “Hm, that doesn’t look quite right,”  and take the time to investigate.

Cycles (aka seasonality) need context

Imagine you have a Snowflake table that gets lots of inserts, but there are fewer on the weekend. 

A clear weekly cycle where the number of rows added drops significantly on weekends depends on time for context.

These weekend drop-offs aren't unexpected—they happen consistently each week. So, they're not really "anomalies" in the sense of something going wrong; they're just part of the regular cycle of how your system operates.

Now, if today were a Wednesday, and I told you that 7,000 rows were added in the last hour—no sweat. But if today were a Saturday? You’d probably want to look into it.

Is it an anomaly… or not?

Ultimately, we need a way to decide what counts as an anomaly and is worthy of our attention. Trends and cycles often carry a certain level of noise that can easily distract us from the real signals. Separating the normal noise from the “something’s wrong!” signal is key to identifying true anomalies. 

Most methods (including Metaplane’s) use thresholds to decide what counts as normal vs anomalous. Depending on the metric, you might use static, rule-based, or model-based thresholds.

Static thresholds for anomaly detection

The simplest anomaly detection strategy is to set a threshold at a single value, especially if you have a clear idea of what you consider an “unhealthy” value and expect that definition to apply indefinitely.

If a dbt job run usually takes 15 minutes, you might not care if it runs faster than that; you just want to know if it ever takes more than 30 minutes. 

This kind of static-based approach works well in scenarios where your system is stable and predictable, and you have a strong sense of what constitutes an outlier. It’s straightforward, so you can focus on only the most critical deviations without getting bogged down by minor variations.

But if your metrics vary over time (or when you're dealing with a dynamic system where "normal" values aren't so easy to pin down), static thresholds won’t work for your use case. You’ll either miss important anomalies or get way too many false positive flags.

Rule-based thresholds for anomaly detection

Sometimes, you need an approach that recognizes what constitutes "normal" isn't static; it can vary based on the context. That’s where rule-based thresholds come in. 

A rule-based threshold allows you to set more nuanced criteria—or rules—that reflect the inherent rhythms of your data. They’re generally better aligned with your data’s real-world behavior, so they typically reduce noise and focus your attention on the deviations that genuinely matter. 

Good news: the complexity of your data doesn’t necessarily require equally complex solutions.​​ Most cases can be covered by simple rules. 

If you’re monitoring the number of rows in a BigQuery table that should only get inserts, maybe you just want to know if the row count ever decreases, so you set a threshold at the latest value.

Rule-based thresholds are based on clear, logical assumptions about how your data behaves. So, while it works well if you can implement the rule with simple logic, the logic can become unwieldy or impossible in certain complex cases.

Model-based thresholds for anomaly detection

In particularly complex cases, statistical time series models are the most powerful option for setting thresholds because they can handle trends, cycles, and other common patterns. They don't just set a fixed threshold or apply a simple rule; they learn from the data itself.

And while there are many sophisticated models to choose from, even simple models can provide good performance. A linear regression can handle many cases.

If your dbt job runs tend to be longer at the beginning of the month when more data needs processing, a linear regression can adjust its expectations accordingly.

By learning from historical data, these models can adapt to changing conditions. This is especially valuable in environments where "normal" is a moving target, and where traditional methods might either overfit (react too strongly to noise) or underfit (miss subtle but significant deviations).

DIY: Live time series model tutorial

Time series models are designed to handle various patterns in the data, such as trends over time, seasonal cycles, or even irregular events that occur with some predictability—that’s what makes them great for setting thresholds when simple rules won’t work. But they do require statistical knowledge to choose the right model, train it, and tune it correctly.

So let’s walk through the setup together. In this example, we’ll implement a linear regression model on some sample data. Here’s the step-by-step process for using a time series model to do anomaly detection on metrics as they come in:

  1. Download the data file from this link.
import numpy as np
import pandas as pd
import statsmodels.api as sm
  1. Plot your metric data over time and choose an appropriate time series model to use
data = pd.read_csv('dbt_run_times.csv', parse_dates=["timestamp"], date_format="%m/%d/%y %H:%M:%S")
datetimes = pd.to_datetime(data["timestamp"])
data["total_seconds"] = (datetimes - datetimes.iloc[0]).dt.total_seconds()
data.plot(x = "timestamp", y = "runtime", xticks=[0, 192, 384], xlabel="Time of Run", ylabel="dbt Job Runtime (in minutes)")
💡 Pro tip: Consider adding other features in your model alongside the raw metric data, such as day of week or other metrics you might track.

The resulting data plot should look like this.
  1. Create a training dataset from the historical data, excluding any past anomalies.
training_data = data[~data["anomaly"]].copy()
training_data.plot(x = "timestamp", y = "runtime", xticks=[0, 190, 380], xlabel="Time of Run", ylabel="dbt Job Runtime (in minutes)")
The resulting training data plot should look like this.
💡 Pro tip: If you don’t know what past data were anomalies, you can start from the beginning of your metric history and loop forward through it with your model, detecting if each new point is an anomaly.

  1. Set hyperparameters to generate thresholds.
alpha = 0.01
💡 Pro tip: With linear regression, you will usually want to set an alpha level on the prediction interval.

  1. Train your model on the historical data, tuning the hyperparameters to maximize performance on anomaly detection.
X_train = training_data["total_seconds"].values
y_train = training_data["runtime"].values
X_train_sm = sm.add_constant(X_train)

model = sm.OLS(endog=y_train, exog=X_train_sm)
result = model.fit()
  1.    Generate prediction intervals for each point in the time series data.
def generate_prediction_intervals(ts, alpha):
   """
   Generate prediction intervals for each point in the time series data.


   Parameters:
   ts (pd.DataFrame): The time series data.
   alpha (float): Significance level for prediction interval.


   Returns:
   pd.DataFrame: DataFrame with predictions and intervals.
   """
   intervals = []
  
   # Iterate over the time series data
   for i in range(10, len(ts)):


       # Training set up to the current point
       train = ts.iloc[:i,]
       X_train = train["total_seconds"].values
       y_train = train["runtime"].values
       X_train_sm = sm.add_constant(X_train)


       # Fit model on training data
       model = sm.OLS(endog=y_train, exog=X_train_sm)
       result = model.fit()


       # Get latest point to generate prediction
       latest = ts.iloc[i]
       X_latest = latest["total_seconds"]
       X_latest_sm = [1, X_latest]       


       # Generate prediction and interval stats
       pred = result.get_prediction(X_latest_sm)
       pred_summary = pred.summary_frame(alpha=alpha)


       # Extract interval bounds
       forecast, pred_int_lower, pred_int_upper = pred_summary[['mean','obs_ci_lower', 'obs_ci_upper']].iloc[0].values
      
       # Append results
       intervals.append({
           'Date': ts.index[i],
           'Actual': ts.iloc[i, 1],
           'Forecast': forecast,
           'Lower Threshold': pred_int_lower,
           'Upper Threshold': pred_int_upper
       })
  
   # Create DataFrame from intervals
   intervals_df = pd.DataFrame(intervals)
  
   return intervals_df


  1. Validate your model by backtesting it as follows. See what gets flagged as an anomaly!
backtested_data = generate_prediction_intervals(data, alpha)
backtested_data.plot(x='Date', y = ['Actual', 'Lower Threshold', 'Upper Threshold'])
💡 Pro tip: If there are no anomalies in the history, your tuned model should not flag any anomalies. If there are anomalies in the history, your tuned model should only flag those points.

  1. When a new metric value comes in, generate a forecast from the model for that timestamp and associated thresholds
new_value = 16.2
new_total_seconds = data["total_seconds"].iloc[-1] + 3600
X_pred = [1, new_total_seconds]    
result.get_prediction(X_pred).summary_frame()[["obs_ci_lower", "obs_ci_upper"]]  
Your new range should look like this.

If the new value is outside the thresholds, congratulations, you’ve detected your first anomaly!

If the new value is within the thresholds, add it to your training dataset and retrain the model. Depending on the velocity of the metric data and the retraining cost, you might not want to do this after every new value, but you should do it frequently enough that your model reflects the current data patterns.

An easier way to scale anomaly detection

Luckily, many time series models are not computationally intensive to train or generate predictions from. And once you’ve completed it for one metric you may be able to use the same time series model for similar metrics by training one model for each metric. 

For example, if your linear regression model works well for the primary dbt job, you can use the same model structure for a nightly ETL job that also follows regular cycles. Of course, you'd need to retrain the model on the specific historical data of the ETL job so it can learn the patterns and behaviors of that specific job, but it cuts down the effort big time.

As you branch out to more metric types, continue to revisit the model hyperparameters or consider new time series models to use to best capture the patterns in your metrics. Or if you’d rather not go through this entire process yourself, we’ve got you covered. Metaplane’s ML-based, always-on monitoring alerts you to every issue across your data stack.

Get started with Metaplane for free or book a demo to learn more.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.