Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

A guide to dbt test types

Understanding the different types of dbt tests can be confusing. We break it all down in this post, telling you the different types of dbt tests, what they do, and when/how to use them.

and
December 20, 2024

Writer @ Metaplane | Data & technology

December 20, 2024
A guide to dbt test types

Since breaking onto the scene, dbt has helped data teams take the guesswork out of your data pipeline—giving them the tools to confidently answer the big question: “Can I trust this?”

dbt testing can be a bit confusing to get started with, though. You’ve got generic tests, custom tests, unit tests, and sub-sections of each of those. And what’s the deal with pre-built packages? How do you decide which ones to use and where they fit in your workflow?

This guide breaks it all down. We’ll walk through the different types of dbt tests, where they shine, and how to make them work for you.

Let’s dive in.

Types of dbt tests

The hierarchy of dbt tests can be confusing. That’s because there are two parent types of categories, then sub-categories within one of those parent types. We’ll aim to clear the air here. 

Testing in dbt generally falls into two broad categories: 

Data tests: Focus on ensuring the accuracy, consistency, and completeness of your data. Think of these as guardrails that help you enforce data quality expectations.

Unit tests: Focus on validating the logic of your models or macros. They’re designed to test small, isolated pieces of your transformation logic using controlled, mock data.

From there, though, data tests can be further categorized into two different categories: generic tests and custom tests. Let’s dive a bit further into each of these sections individually.

dbt data tests

Data tests are what most people think of when they think dbt testing. These tests focus on validating the accuracy, consistency, and completeness of your data. Think of these as guardrails that help you enforce data quality expectations.

Data tests can be broken down further into generic tests and custom tests.

Generic dbt tests

Generic tests are pre-defined, reusable checks that you can apply directly in your schema.yml file. These are perfect for the most common data quality rules and require almost no setup. dbt provides four built-in generic tests:

Unique: Ensures no duplicate values exist in a column (e.g., primary keys).

Not Null: Verifies that a column doesn’t contain any NULL values (e.g., every order_id should have a corresponding value).

Accepted Values: Ensures a column only contains specified values (e.g., payment methods like credit_card, paypal).

Relationships: Validates foreign key relationships (e.g., every customer_id in orders exists in customers).

Here’s an example of setting up a generic test in your schema.yml file:

```sql

models:

  - name: orders

    columns:

      - name: order_id

        tests:

          - unique

          - not_null

      - name: payment_method

        tests:

          - accepted_values:

              values: ['credit_card', 'paypal', 'bank_transfer']

      - name: customer_id

        tests:

          - relationships:

              to: ref('customers')

              field: customer_id

```

Custom dbt tests

For scenarios where generic tests don’t quite cut it, you can create custom SQL-based tests. These tests allow you to define specific validation logic, giving you full control. Custom tests live in the tests/directory and return rows that fail the validation.

For example, to check for duplicate orders by `order_id` and `order_date`:

```sql

SELECT order_id, order_date

FROM {{ ref('orders') }}

GROUP BY order_id, order_date

HAVING COUNT(*) > 1

```

When you run `dbt test`, dbt flags any rows returned by this query as test failures.

dbt test packages

Building all your tests from scratch can be daunting, but don’t worry—the dbt ecosystem has you covered. dbt has a lot of community-driven test packages that extend dbt’s functionality with ready-made solutions for common data quality checks and workflow improvements. Different tests are built for different things, too. Some can monitor your data, some can enforce best practices, and some can make your testing process more efficient.

Popular dbt test packages

  1. dbt-utils: This package is a bit of a swiss army knife, covering foundational tests that are applicable to most projects. Popular tests include equal_rowcount (ensures two tables have the same number of rows) and expression_is_true (validates custom conditions).
  2. Great Expectations: A standalone data validation framework that integrates with dbt. Great Expectations offers a highly customizable way to define, document, and execute complex data quality checks, with features like detailed error reporting and visualizations.
  3. dbt-expectations: Inspired by Great Expectations, this package provides a collection of pre-built, customizable tests. Examples include expect_column_values_to_be_increasing and expect_column_median_to_be_between. It’s great for adding nuanced checks without setting up the entire Great Expectations framework.
  4. dbt-checkpoint: A lightweight package that enforces coding standards and best practices in your dbt project. While not strictly for data tests, dbt-checkpoint helps ensure your dbt codebase is consistent, readable, and follows community conventions.

dbt unit tests

While data tests ensure the output of your pipeline is correct, unit tests validate the logic of your models or macros. They’re designed to test small, isolated pieces of your transformation logic using controlled, mock data.

For example, imagine you’re writing a complex dbt model that calculates revenue. Your business logic includes handling refunds, discounts, and taxes. If something goes wrong, how do you pinpoint the issue? Unit tests allow you to validate that each piece of logic behaves as expected in isolation before integrating it with real data.

How to write unit tests in dbt

Unit tests are all about validating that specific parts of your logic work as expected when isolated from the complexities of real-world data. While dbt doesn’t have native support for unit tests (yet!), you can simulate them using a combination of mock data, test models, and SQL assertions. Let’s walk through an example step by step.

Imagine you’re building a dbt model that calculates the final price for each order, taking into account discounts. This logic might be part of a larger data transformation pipeline, but you want to test it independently to ensure accuracy.

Step 1: Create mock data

When writing a unit test, it’s helpful to create mock data so that you can control the input data for testing. They act as stand-ins for your source tables, allowing you to work with predictable data.

Seed file: seeds/test_orders.csv

Order ID Order Total Discount
1 100 10
2 50 5
3 200 0
4 75 -10

Step 2: Build the transformation model

Next, write the dbt model you want to test. This model calculates the final price for each order by subtracting the discount from the total.

Model: models/calculate_final_price.sql

```sql

WITH source_data AS (

    SELECT * FROM {{ ref('test_orders') }}

)

SELECT

    order_id,

    order_total,

    discount,

    order_total - discount AS final_price

FROM source_data

```

The transformation takes the mock data from the seed file (test_orders) and applies the business logic: final_price = order_total - discount.

This logic needs to handle edge cases, like negative discounts or NULL values, correctly. Unit testing will help ensure your transformation behaves as expected.

Step 3: Write the unit test

Now, write a test to validate your transformation. The goal is to check if the calculated final_price ever falls below zero—an invalid scenario in this business context.

Unit Test Query: tests/test_calculate_final_price.sql

```sql

SELECT *

FROM {{ ref('calculate_final_price') }}

WHERE final_price < 0

```

This query selects any rows from the calculate_final_price model where final_price is negative. If the test returns rows, it indicates a failure because the business logic isn’t handling edge cases (like negative discounts) properly.

Step 4: Run the test

To execute the unit test, run the following commands:

```sql

dbt seed --select test_orders  # Load the seed file into your database

dbt run --select calculate_final_price  # Build the model

dbt test --select test_calculate_final_price  # Run the unit test

```

If everything works, the test will pass with no rows returned. If the test fails, dbt will flag the rows that don’t meet the conditions. For example:

Order ID Order Total Discount Final Price
4 75 -10 85

This failure tells you that your logic isn’t correctly handling negative discounts. You can then refine your model to address the issue (e.g., add a WHERE discount >= 0 clause).

Once you’ve identified and fixed any issues, rerun the test to ensure your changes work as expected. This iterative process helps you lock in the correctness of your transformation logic before it’s applied to live data.

How do dbt tests and data observability work together?

dbt tests excel at verifying known, static issues in your data, such as ensuring referential integrity, catching null values, or checking for duplicate primary keys. They are structured, explicit, and easy to maintain when used for targeted purposes.

But as data pipelines grow more complex, relying solely on dbt tests can lead to challenges like test suites that balloon to hundreds or thousands of tests.

This is where Metaplane’s data observability platform complements dbt. Instead of manually setting up hundreds of scalar outcome tests with hardcoded thresholds, Metaplane automatically monitors trends, detects anomalies, and alerts you when data behaves unexpectedly—all with minimal setup. It saves data teams from having to encode every potential scenario into a dbt test and provides insights into dynamic, unknown issues like unusual spikes in row counts or changes in data patterns. 

Increase trust in your data with dbt tests

Testing in dbt isn’t just about preventing mistakes—it’s about building trust in your data. By leveraging a mix of data tests, unit tests, and assertions, you can create a pipeline that not only works but delivers consistent, high-quality insights.

Start small with generic tests, explore custom and unit tests as your needs grow, and don’t forget to tap into the power of pre-built packages. With the right approach, dbt testing will become one of the most valuable tools in your data engineering toolkit.

Try Metaplane today and pair your dbt tests with ML anomaly detection for better data quality.

We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

Your email
Ensure trust in data

Start monitoring your data in minutes.

Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.