How to build a live time series anomaly detection model

Need to know when something’s off? Live time series models can detect anomalies for any metric in your data stack. Here’s how to build one yourself.

David Braslow, EdD

and

August 23, 2024

David Braslow, EdD

Founding Data Scientist @ Metaplane

August 23, 2024

How to build a live time series anomaly detection model

How do you detect when a dbt job is lagging?

How do you get early warnings of problems in ETL pipelines?

How do you answer questions like this?

Sure, you can track dozens of metrics to try and make sense of what’s happening in your data stack, but once you have them… what do you actually do with them?

Blindly tracking metrics is easy. Thoughtfully using them is surprisingly hard: you need to constantly review them and decide whether or not they’re healthy using contextual knowledge about your stack. Doing this by hand for hundreds of metrics is time-consuming—if not impossible—and a waste of effort if those metrics aren’t doing anything interesting.

That’s why we need tools to flag us when our once-consistent metrics start looking inconsistent.

Be the first to know, every time

The data team needs to know as soon as any metric looks irregular. Like, if a dbt job that usually takes 20 minutes now takes over an hour, or a regular Fivetran table sync that usually adds 1000 rows now only adds 50.

It might be just a weird fluke, but it might be a signal that something’s broken. If you know immediately when something looks not-quite-right, you can:

Figure out if there’s a problem at all
If so, fix the problem before it has major downstream effects

Live time series anomaly detection allows you to do this for any metric in your data stack. But what is it exactly? Let’s break it down:

Live (aka real-time or online): Using a statistical model on the most recent data (e.g., evaluating the latest dbt job run time).
Time series: A dataset that is indexed by time or a statistical model for such data (e.g., a dbt job’s run times over the past month).
Anomaly detection: Flagging some data in a dataset as being unlike the rest (e.g., flagging an unusually long run time for one dbt job run).

Why time series?

Anomaly detection methods that use time-sensitive information perform much better than models that don’t—especially when you’re observing metrics that change over time (like your data warehousing cases that contain trends and cycles).

Trends change over time

Let’s say we track run times for a dbt job, and we ignore the time index. They have a distribution like this, with run times ranging from 10 minutes to 28 minutes, but most taking around 20:

*Nothing out of the ordinary with the cross-sectional distribution for dbt job runs, right?*

If I told you that the last dbt job run took 25 minutes, would you consider that anomalous? Probably not—it falls neatly into the above distribution. But let’s look at the same data using the time index:

*See how those same dbt job run times look a bit different when you add a timestamp?*

While a 25-minute dbt run might not have been anomalous in the past, you can see at the right edge of the chart that the run only takes about 15 minutes now. The run time of this job has been decreasing as the computing power assigned to the job has increased. So, if a new run did take 25 minutes, you might think, “Hm, that doesn’t look quite right,” and take the time to investigate.

Cycles (aka seasonality) need context

Imagine you have a Snowflake table that gets lots of inserts, but there are fewer on the weekend.

*A clear weekly cycle where the number of rows added drops significantly on weekends depends on time for context.*

These weekend drop-offs aren't unexpected—they happen consistently each week. So, they're not really "anomalies" in the sense of something going wrong; they're just part of the regular cycle of how your system operates.

Now, if today were a Wednesday, and I told you that 7,000 rows were added in the last hour—no sweat. But if today were a Saturday? You’d probably want to look into it.

Is it an anomaly… or not?

Ultimately, we need a way to decide what counts as an anomaly and is worthy of our attention. Trends and cycles often carry a certain level of noise that can easily distract us from the real signals. Separating the normal noise from the “something’s wrong!” signal is key to identifying true anomalies.

Most methods (including Metaplane’s) use thresholds to decide what counts as normal vs anomalous. Depending on the metric, you might use static, rule-based, or model-based thresholds.

Static thresholds for anomaly detection

The simplest anomaly detection strategy is to set a threshold at a single value, especially if you have a clear idea of what you consider an “unhealthy” value and expect that definition to apply indefinitely.

*If a dbt job run usually takes 15 minutes, you might not care if it runs faster than that; you just want to know if it ever takes more than 30 minutes.*

This kind of static-based approach works well in scenarios where your system is stable and predictable, and you have a strong sense of what constitutes an outlier. It’s straightforward, so you can focus on only the most critical deviations without getting bogged down by minor variations.

But if your metrics vary over time (or when you're dealing with a dynamic system where "normal" values aren't so easy to pin down), static thresholds won’t work for your use case. You’ll either miss important anomalies or get way too many false positive flags.

Rule-based thresholds for anomaly detection

Sometimes, you need an approach that recognizes what constitutes "normal" isn't static; it can vary based on the context. That’s where rule-based thresholds come in.

A rule-based threshold allows you to set more nuanced criteria—or rules—that reflect the inherent rhythms of your data. They’re generally better aligned with your data’s real-world behavior, so they typically reduce noise and focus your attention on the deviations that genuinely matter.

Good news: the complexity of your data doesn’t necessarily require equally complex solutions. Most cases can be covered by simple rules.

*If you’re monitoring the number of rows in a BigQuery table that should only get inserts, maybe you just want to know if the row count ever decreases, so you set a threshold at the latest value.*

Rule-based thresholds are based on clear, logical assumptions about how your data behaves. So, while it works well if you can implement the rule with simple logic, the logic can become unwieldy or impossible in certain complex cases.

Model-based thresholds for anomaly detection

In particularly complex cases, statistical time series models are the most powerful option for setting thresholds because they can handle trends, cycles, and other common patterns. They don't just set a fixed threshold or apply a simple rule; they learn from the data itself.

And while there are many sophisticated models to choose from, even simple models can provide good performance. A linear regression can handle many cases.

*If your dbt job runs tend to be longer at the beginning of the month when more data needs processing, a linear regression can adjust its expectations accordingly.*

By learning from historical data, these models can adapt to changing conditions. This is especially valuable in environments where "normal" is a moving target, and where traditional methods might either overfit (react too strongly to noise) or underfit (miss subtle but significant deviations).

DIY: Live time series model tutorial

Time series models are designed to handle various patterns in the data, such as trends over time, seasonal cycles, or even irregular events that occur with some predictability—that’s what makes them great for setting thresholds when simple rules won’t work. But they do require statistical knowledge to choose the right model, train it, and tune it correctly.

So let’s walk through the setup together. In this example, we’ll implement a linear regression model on some sample data. Here’s the step-by-step process for using a time series model to do anomaly detection on metrics as they come in:

Download the data file from this link.

import numpy as np
import pandas as pd
import statsmodels.api as sm

Plot your metric data over time and choose an appropriate time series model to use

data = pd.read_csv('dbt_run_times.csv', parse_dates=["timestamp"], date_format="%m/%d/%y %H:%M:%S")
datetimes = pd.to_datetime(data["timestamp"])
data["total_seconds"] = (datetimes - datetimes.iloc[0]).dt.total_seconds()
data.plot(x = "timestamp", y = "runtime", xticks=[0, 192, 384], xlabel="Time of Run", ylabel="dbt Job Runtime (in minutes)")

💡 Pro tip: Consider adding other features in your model alongside the raw metric data, such as day of week or other metrics you might track.

‍

*The resulting data plot should look like this.*

Create a training dataset from the historical data, excluding any past anomalies.

training_data = data[~data["anomaly"]].copy()
training_data.plot(x = "timestamp", y = "runtime", xticks=[0, 190, 380], xlabel="Time of Run", ylabel="dbt Job Runtime (in minutes)")

*The resulting training data plot should look like this.*

💡 Pro tip: If you don’t know what past data were anomalies, you can start from the beginning of your metric history and loop forward through it with your model, detecting if each new point is an anomaly.

‍

Set hyperparameters to generate thresholds.

alpha = 0.01

💡 Pro tip: With linear regression, you will usually want to set an alpha level on the prediction interval.

‍

Train your model on the historical data, tuning the hyperparameters to maximize performance on anomaly detection.

X_train = training_data["total_seconds"].values
y_train = training_data["runtime"].values
X_train_sm = sm.add_constant(X_train)

model = sm.OLS(endog=y_train, exog=X_train_sm)
result = model.fit()

Generate prediction intervals for each point in the time series data.

def generate_prediction_intervals(ts, alpha):
   """
   Generate prediction intervals for each point in the time series data.


   Parameters:
   ts (pd.DataFrame): The time series data.
   alpha (float): Significance level for prediction interval.


   Returns:
   pd.DataFrame: DataFrame with predictions and intervals.
   """
   intervals = []
  
   # Iterate over the time series data
   for i in range(10, len(ts)):


       # Training set up to the current point
       train = ts.iloc[:i,]
       X_train = train["total_seconds"].values
       y_train = train["runtime"].values
       X_train_sm = sm.add_constant(X_train)


       # Fit model on training data
       model = sm.OLS(endog=y_train, exog=X_train_sm)
       result = model.fit()


       # Get latest point to generate prediction
       latest = ts.iloc[i]
       X_latest = latest["total_seconds"]
       X_latest_sm = [1, X_latest]       


       # Generate prediction and interval stats
       pred = result.get_prediction(X_latest_sm)
       pred_summary = pred.summary_frame(alpha=alpha)


       # Extract interval bounds
       forecast, pred_int_lower, pred_int_upper = pred_summary[['mean','obs_ci_lower', 'obs_ci_upper']].iloc[0].values
      
       # Append results
       intervals.append({
           'Date': ts.index[i],
           'Actual': ts.iloc[i, 1],
           'Forecast': forecast,
           'Lower Threshold': pred_int_lower,
           'Upper Threshold': pred_int_upper
       })
  
   # Create DataFrame from intervals
   intervals_df = pd.DataFrame(intervals)
  
   return intervals_df

‍

Validate your model by backtesting it as follows. See what gets flagged as an anomaly!

backtested_data = generate_prediction_intervals(data, alpha)
backtested_data.plot(x='Date', y = ['Actual', 'Lower Threshold', 'Upper Threshold'])

💡 Pro tip: If there are no anomalies in the history, your tuned model should not flag any anomalies. If there are anomalies in the history, your tuned model should only flag those points.

‍

When a new metric value comes in, generate a forecast from the model for that timestamp and associated thresholds

new_value = 16.2
new_total_seconds = data["total_seconds"].iloc[-1] + 3600
X_pred = [1, new_total_seconds]    
result.get_prediction(X_pred).summary_frame()[["obs_ci_lower", "obs_ci_upper"]]

If the new value is outside the thresholds, congratulations, you’ve detected your first anomaly!

If the new value is within the thresholds, add it to your training dataset and retrain the model. Depending on the velocity of the metric data and the retraining cost, you might not want to do this after every new value, but you should do it frequently enough that your model reflects the current data patterns.

An easier way to scale anomaly detection

Luckily, many time series models are not computationally intensive to train or generate predictions from. And once you’ve completed it for one metric you may be able to use the same time series model for similar metrics by training one model for each metric.

For example, if your linear regression model works well for the primary dbt job, you can use the same model structure for a nightly ETL job that also follows regular cycles. Of course, you'd need to retrain the model on the specific historical data of the ETL job so it can learn the patterns and behaviors of that specific job, but it cuts down the effort big time.

As you branch out to more metric types, continue to revisit the model hyperparameters or consider new time series models to use to best capture the patterns in your metrics. Or if you’d rather not go through this entire process yourself, we’ve got you covered. Metaplane’s ML-based, always-on monitoring alerts you to every issue across your data stack.

Get started with Metaplane for free or book a demo to learn more.

How to build a live time series anomaly detection model

Be the first to know, every time

Why time series?

Trends change over time

Cycles (aka seasonality) need context

Is it an anomaly… or not?

Static thresholds for anomaly detection

Rule-based thresholds for anomaly detection

Model-based thresholds for anomaly detection

DIY: Live time series model tutorial

An easier way to scale anomaly detection

Table of contents

Tags

Build a data culture by increasing data literacy

Submit where you want to send your free guide for driving dashboard usage!

How to proactively prevent incidents

Where should we send your 1-pager on incident prevention?

Getting started with data observability guide

Let us know where to send your Guide to Data Observability

Stay updated on the latest product updates

Start monitoring your data in minutes.

How to build a live time series anomaly detection model

Be the first to know, every time

Why time series?

Trends change over time

Cycles (aka seasonality) need context

Is it an anomaly… or not?

Static thresholds for anomaly detection

Rule-based thresholds for anomaly detection

Model-based thresholds for anomaly detection

DIY: Live time series model tutorial

An easier way to scale anomaly detection

Table of contents

Tags

Build a data culture by increasing data literacy

Submit where you want to send your free guide for driving dashboard usage!

Please check your inbox for your guide to driving dashboard usage!

How to proactively prevent incidents

Where should we send your 1-pager on incident prevention?

Please check your inbox for your 1-pager on incident prevention!

Getting started with data observability guide

Let us know where to send your Guide to Data Observability

Please check your inbox for your guide to data observability!

Stay updated on the latest product updates

Start monitoring your data in minutes.