Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

Everything you need to know about dbt tests: How to write them, examples, and best practices

Here's everything you need to know about dbt tests, including what they are, how to write them, examples, and best practices for implementing them in your data workflow.

and
January 6, 2025

Technical Educator

January 6, 2025
Everything you need to know about dbt tests: How to write them, examples, and best practices

Over 25,000 companies use dbt (data build tool), so it’s safe to say it’s quite popular in the data analytics and engineering scene. In the modern data stack, dbt plays a crucial role in automating data transformations to ensure data quality, and drive transparency in the data workflow. 

One of the most important tasks data engineers handle is ensuring data quality and integrity across the data pipeline. This is done by running dbt tests, which identify data anomalies and issues early in the data transformation process. 

With dbt, analytics engineers are able to automate testing which reduces errors, builds trust in analytics, avoids data downtime and supports a culture of data-driven decision-making. However, navigating dbt tests can be a bit confusing especially when trying to understand the nuances of dbt tests. You might be wondering how to set up your first test or how to run tests in your CI/CD pipeline, this guide includes everything you need to know about dbt tests.

Let’s dive in.

What are dbt tests?

A dbt test is an assertion to validate data models to ensure data quality and integrity. While tests are commonly associated with validating models, they can also be applied to other dbt objects such as sources, seeds, and snapshots.

Tests are primarily used to confirm that the data meets expectations. This includes checking for null values, ensuring values fall within a defined range, or verifying relationships between datasets. Additionally, dbt allows for the creation of custom tests to validate specific business logic or rules unique to the data pipeline.

Types of tests in dbt

There are two main types of dbt tests: generic and custom. Generic tests are pre-built tests in dbt, defined in schema.yml, for common data quality checks. They're easy to implement and require minimal effort. Custom tests are customized tests that you develop to validate specific business logic or project-specific validation rules. 

Custom tests require more complex SQL queries to check for specific conditions that go beyond standard data integrity checks. Custom and generic tests serve different but complementary purposes in ensuring data quality, and knowing when and how to use each type allows you to build a more robust and reliable data pipeline.

Built-in generic tests

Generic tests are also referred to as schema tests which are predefined and are predefined in dbt. The most common generic tests include:

  • Unique 
  • Not Null
  • Accepted values 
  • Relationships

Unique tests 

Unique tests verify that every value in a specified column is distinct and ensures that there are no duplicates. These tests are particularly important for columns meant to serve as unique identifiers, such as `order_id`, `customer_id`. Unique tests maintain data integrity and prevent errors caused by duplicate entries.

For example, a unique test applied to the `order_id` column might look like this:

```sql

columns:

  - name: order_id

    tests:

      - unique

```

Not null

Not null tests ensure that no `NULL` values exist in a column. These tests are commonly used in mandatory fields like `order_date` or `customer_email`, to ensure there is no missing information which could disrupt data analytics. 

A not null test applied to the `order_date` column might look like this:

```sql

columns:

  - name: order_date

    tests:

      - not_null

```

Accepted values

Accepted values tests verify that a column’s entries match a specific, predefined set of valid options. The valid options are based on the business logic or rules governing the data. Accepted value tests are useful for fields with limited, controlled values, such as `status` or `region_code`, to ensure data consistency and prevent invalid entries.

For example, an `accepted_values` test for the status column would be configured as follows:

```sql

columns:

  - name: status

    tests:

      - accepted_values:

          values: ['pending', 'completed', 'cancelled']

```

Relationships

Relationships tests validate that foreign key relationships are maintained between two tables. A foreign key is a column in one table that establishes a relationship with the primary key in another table.

For instance, data engineers use a relationship test to validate that every `customer_id` in the `orders` table corresponds to a valid id in the `customers` table. Relationships tests validate data integrity.

```sql

columns:

  - name: customer_id

    tests:

      - relationships:

          to: ref('customers')

          field: id

```

Custom generic tests

Custom generic tests are reusable tests used to define complex or project-specific rules tailored to your data needs. You get to define and create the tests, depending on what you want to validate. For instance, you can validate domain-specific constraints, such as ensuring `sales_amount` contains no negative values, or enforce consistency by reapplying the same logic across multiple models or columns.

How to create custom generic tests

1. Define the test in SQL

Create a custom test by defining it in a .sql file within the macros/ folder. For example, to test for positive values, you can create a file named test_positive_values.sql.

```sql

{% test test_positive_values(model, column_name) %}

SELECT *

FROM {{ model }}

WHERE {{ column_name }} <= 0

{% endtest %}

```

The file is now: `macros/custom_tests/test_positive_values.sql`

2. Reference the test in schema.yml:

Add the custom test to your schema.yml file, specifying the column where it should be applied. For instance, this configuration ensures that dbt will run the `test_positive_values` test on the `sales_amount` column, checking for any non-positive values.

```sql

columns:

  - name: sales_amount

    tests:

      - test_positive_values

```

This test ensures that the custom rules are validated and integrated seamlessly into the dbt project.

Singular tests

Singular tests validate specific logic or assertions that are not tied to a particular column and are defined as standalone SQL queries in the `tests/` folder. These tests are used to enforce rules that apply across models. Singular tests ensure data integrity or enforce business rules. 

Additionally, data engineers run singular tests for one-off checks that don’t fit into standard column-based tests. A singular test returns rows that fail the test, highlighting where the data doesn't meet the specified criteria.

Writing singular tests:

  1. Create a SQL file in the `tests/` folder: The test is written as a standalone SQL query
  2. Write the SQL query for the test: The query should check a specific logic or rule, such as validating relationships between tables or checking for data anomalies. Data engineers typically use `SELECT` statements with the necessary `JOIN`, `WHERE`, and other SQL logic.
  3. Reference the relevant models (tables/views): Use the `ref()` function to refer to models (tables or views) in your dbt project. The `ref()` function ensures that dbt handles the dependencies correctly.
  4. Save the file: Save the SQL file in the `tests/` folder with a descriptive name, such as `test_orders_with_no_customers.sql`.
  5. Run the test: After creating the test, run it using the following dbt `test` command

Let’s say you want to create a singular test to check if any orders in the `orders` table don’t have an associated customer in the `customers` table. 

Here’s what it looks like:

```sql

SELECT o.order_id

FROM {{ ref('orders') }} o

LEFT JOIN {{ ref('customers') }} c

ON o.customer_id = c.customer_id

WHERE c.customer_id IS NULL

```

Unit tests

Unit tests in dbt validate the correctness of complex transformations or logic within models. Similar to traditional software unit tests, these tests work by isolating specific logic for validation. Suppose you have a transformation in your `sales` model where you're calculating the total sales amount for each customer. You want to validate that the `total_sales` field is correctly calculated by summing individual sales values. 

Unit tests allow data engineers to create a test model that mimics the actual data but only includes a small subset of data relevant to the calculation. Unit tests isolate specific data transformations or processes, which makes debugging easier as data engineers can quickly identify issues in individual components of the data pipeline. Testing small, focused pieces of logic (such as data cleaning steps, aggregations, or transformations) means that engineers can validate each part of the pipeline before moving on to the next stage. 

Here is an example with a small dataset that includes two e-commerce orders. One has a positive order amount, and the other has a negative order amount. The goal is to test the logic that ensures the order amount is never negative.

```sql

SELECT 1 AS order_id, '2024-01-01' AS order_date, 100 AS order_amount

UNION ALL

SELECT 2 AS order_id, '2024-01-02' AS order_date, -50 AS order_amount

```

In this test dataset, we have:

  • A valid order with a positive amount (100).
  • An invalid order with a negative amount (-50).

Next, we write an assertion using `tests/` to create a singular test to validate the logic of ensuring that the `order_amount` should never be negative. This is done by creating a file in the `tests/` folder, such as `tests/test_positive_order_amount.sql`:

```sql

SELECT *

FROM {{ ref('test_orders') }}

WHERE order_amount <0

```

This test query checks if there are any orders in the test_orders table with a negative order_amount. Since the test dataset includes an invalid negative value, this query will return that row. 

Next, run the following test command to execute the test:

```sql

dbt test

```

The `test` command validates the results, ensuring that no rows are returned to confirm the logic is working as intended. If the test returns rows where the `order_amount` is negative, it indicates an error in the data. This allows you to isolate the failure and address the specific logic causing the issue.

Here’s a quick recap of dbt tests:

Test Type Definition Use Case
Built-in generic Predefined tests for uniqueness, null values, accepted values, and relationships. Common data quality checks
Custom generic User-defined tests written in SQL and YAML. Validate complex or domain-specific rules.
Singular Standalone SQL queries that validate specific business rules. One-off checks and cross-model validations
Unit tests Isolated tests that validate complex transformations or logic. Ensure SQL transformations produce expected results; test derived metrics and aggregations

How dbt tests fit into CI/CD pipelines

When integrated with CI/CD pipelines, dbt can provide a solid framework for testing, deploying, and monitoring data models, ensuring data reliability at every step. Validating code before it reaches production enables engineers to avoid spending time addressing issues from broken reports and maintains trust in data systems. 

A well-configured CI/CD setup can minimize downtime and prevent unnecessary disruptions. For example, you can automate the testing and validation of your data transformations by triggering a CI/CD pipeline whenever code changes are made, ensuring your models are reliable and ready for deployment.

Metaplane customers like Gorgias use Metaplane’s CI/CD tool to automate testing. With over 20 data analysts and engineers developing models in dbt, the data analytics and engineering team ensure that they’re able to identify affected models and prevent code changes from impacting the data.

Even in organizations with a strong data culture that emphasizes testing code before merging into the main branch, engineers may occasionally skip testing (intentionally or unintentionally). 

When a model is edited and merged into the main branch without testing, downstream models may fail to build in the production environment. Or, a critical table like the `customers` table breaks, leaving other teams that rely on data frustrated with outdated or missing data. 

Additionally, changes to SQL logic may sometimes be made without approval for new business rules or the logic is incorrect, which leads to errors. While extending code reviews or increasing the number of approvers might address these issues, data engineers need an automated process to maintain reliability and efficiency.

More specifically, here’s more about dbt’s role in CI/CD pipelines:

Automated testing with dbt

dbt's testing framework provides the foundation for CI/CD in data pipelines. Automated built in or custom dbt tests take place directly into the CI/CD pipeline when engineers make changes to data models. Automated testing in the CI/CD pipeline ensures that any issues with data quality, such as broken pipelines or failed transformations, are detected early on and prevent broken pipelines from progressing. 

Data and analytics engineer, Elliot Trabac from Gorgias notes “having additional information, like being aware of what changes will happen and which models they’ll affect, alongside existing CI/CD checks, adds an extra layer of security.” Automated testing with dbt streamlines the workflow and also enhances the reliability and accuracy of data transformations.

Code and model validation

dbt plays a key role in validating code and models before deployment to ensure smooth and reliable data transformations. Syntax validation ensures that SQL code and dbt configuration files, like schema.yml and dbt_project.yml, are written correctly and there are no errors.

Model builds validate that the data models are compiled correctly and function as expected when tested with sample or live data. These automated checks within a CI/CD pipeline flag potential issues early, which streamlines the deployment process and enhances the reliability of data models before they reach production.

Lineage and dependency management

dbt data lineage and dependency management automatically generate dependency graphs to track the relationships between models in the data pipeline. During CI, dbt ensures that changes to one model do not break downstream dependencies, while also helping identify which models may be impacted by changes made upstream.

This added visibility allows teams to maintain data integrity and ensures that transformations remain consistent and reliable even as updates are made, while also reducing manual effort in checking how code changes might affect other parts of the pipeline and helping teams avoid errors.

Documentation as a deliverable

With dbt, documentation is automatically updated as part of the CI/CD pipeline, ensuring tha accurate and up-to-date data documentation is available post-deployment. The automated documentation generation helps maintain a consistent record of data models, their dependencies, and transformations, making it easier for teams to understand and collaborate on data projects.

Specifically, this dbt feature supports efficient data governance, promotes transparency, and ensures that everyone in the team is working with the most current and accurate information, as the project evolves. 

Environment-specific deployments

dbt allows models to be executed across different environments, such as development, staging, and production, while ensuring a controlled and efficient deployment process. CI/CD pipelines can be configured to first build and test models in staging environments, verifying that everything works as expected without impacting live data. 

Once these tests pass, models are safely deployed to production environments, reducing the risk of introducing errors into the live data. This approach provides a structured way to validate data transformations and ensures that only fully tested models make it into production, maintaining the integrity and reliability of the data pipeline.

How to implement dbt tests

We've covered what dbt does, the different types of dbt tests, and why it's important for CI/CD. Now, let's get more tactical and learn how to write and implement dbt tests.

Writing basic built-in tests

dbt provides built-in tests for common data quality checks, which are defined in schema.yml files.

Step 1: Define a Model

Create a model, for example, `models/orders.sql`:

```sql

SELECT * FROM raw.shopify_orders

```

Step 2: Add tests in schema.yml

Define the schema file in the same folder as the model:

```sql

models:

  - name: orders

    description: "Order data from Shopify"

    columns:

      - name: order_id

        description: "Unique identifier for each order"

        tests:

          - unique

          - not_null

      - name: customer_id

        description: "Identifier for customers"

        tests:

          - not_null

```

  • `unique` ensures no duplicate values in the column.
  • `not_null` ensures there are no `NULL` values.

Step 3: Run the tests

Execute the following command to run the tests:

```sql

dbt test

```

This command will check the `unique` and `not_null` constraints on the specified columns and will fail the test if any issues are detected.

Setting up custom tests

For more complex data validation, you can write custom tests using SQL and YAML.

Step 1: Create a custom test

Custom tests are SQL queries that return rows failing the test.

Create a new test in the `tests` folder, e.g., `tests/test_positive_order_amount.sql`:

```sql

SELECT

*FROM {{ ref('orders') }}

WHERE order_amount <=0

```

Any rows returned by this query indicate a test failure. 

Step 2: Register the test in schema.yml

Add the custom test to your schema file:

```sql

models:

  - name: orders

    description: "Order data from Shopify"

    columns:

      - name: order_amount

        description: "The total amount for the order"

        tests:

          - not_null

          - custom_test: test_positive_order_amount

```

Step 3: Run the custom test

Run the tests, including custom ones:

```sql

dbt test

```

Structuring your tests/ folder for scalability

Organize your `tests/` folder for maintainability and scalability, here is a recommended structure:

Folder details:

  1. generic/:
    some text
    • Contains reusable custom tests (e.g., positive values, valid email format).
    • These tests can be applied to multiple models or columns via schema.yml.
  2. integration/:
    Contains tests for inter-model relationships (e.g., referential integrity between `orders` and `customers`).
  3. Model-specific tests:
    • Simple tests like `unique` or `not_null` are defined in schema.yml near the corresponding model.

Best practices for writing effective dbt tests

Writing dbt tests in one thing, but setting yourself up for success with your testing workflow is another thing entirely. Here's a few best practices to follow when implementing dbt tests.

Use descriptive naming conventions for tests

Choose names that immediately convey the test's purpose to anyone reviewing it. Include relevant details, such as the model being tested and the specific check being performed to provide context. Descriptive names help you quickly identify what a test is checking during debugging or reviewing logs.

Examples of descriptive test names:

  • Built-in tests: `unique_customer_id`, `not_null_order_date`
  • Custom tests: `test_positive_sales_amount`, `test_valid_email_format`

Create a logical folder structure

Structure your `tests/` folder logically to improve scalability and readability. Organized folders make it easier to locate, maintain, and extend tests.

Recommended structure:

Choose the right granularity for testing

Focus on business-critical rules and avoid redundant testing. For instance, testing uniqueness and null values for the same column separately (which will bring the same results). Apply aggregations thoughtfully, for instance validate total sales over specific date ranges rather than checking every individual row. Under-testing can lead to poor data validation and over-testing can lead to developer inefficiencies.

Use assertions for clear failures

Use SQL assertions to ensure failures are clear and easy to debug. dbt tests should provide actionable results by returning rows that fail a specific condition. Actionable failures help developers quickly locate and fix data issues. For example, in this custom test: `test_no_future_dates.sql`

```sql

SELECT *

FROM {{ ref('orders') }}

WHERE order_date > CURRENT_DATE

```

The SQL assertion is a custom test to ensure there are no future-dated orders in the data. In this test, any rows returned would indicate an issue: orders with a date later than the current date. These actionable failures provide clear, concise feedback that helps developers trace the root cause of the problem and implement fixes efficiently, which maintains data integrity and reliability.

Write tests to ensure clear and actionable failure messages

Clear failure messages reduce debugging time and improves collaboration across teams. For custom tests, provide explicit descriptions for failed rows, this will help other engineers and analysts understand what is wrong. Include context in the assertions in the error messages to make debugging easier.

This test checks the orders table to find rows where order_date is in the future, if any rows are returned, they indicate the specific `order_id` and `order_date` values causing the test to fail. This provides clear, actionable details for debugging.

```sql

SELECT  

order_id,  

order_date

FROM {{ ref('orders') }}

WHERE order_date > CURRENT_DATE

```

The schema.yml file is where you define metadata, such as test configurations and descriptions for your models and their fields. Use a clear description in your schema.yml.The assertion below includes built-in tests not_null, which ensures there are no NULL values in the order_date column, and relationships, which validates foreign key integrity by confirming that order_date maps correctly to date_sk in the dim_date table. 

If these tests fail, dbt generates detailed failure messages, highlighting the rows or conditions that caused the issue, which helps teams address the problem quickly.

```sql

models:

  - name: orders

    columns:

      - name: order_date

        tests:

          - not_null

          - relationships:

              to: dim_date

              field: date_sk

          - custom_test: no_future_dates

```

Track and resolve test failures promptly

Integrate dbt tests with CI/CD pipelines to catch and resolve issues as early as possible. Prompt resolution prevents bad data from propagating to downstream systems. Use dbt artifacts (e.g., run_results.json) to monitor test outcomes. Use Metaplane’s dbt alerting tool to get context-rich alerts to your dashboard for failed tests. 

Metaplane’s dbt alerting tool, enables you to quickly identify test failures with context so you can assess the severity of each failed job faster and manage alert routing across your data team. Avoid alert fatigue by routing the relevant alerts to the right stakeholders.

Combine dbt tests with data observability tools 

Execute dbt tests with data observability tools like Metaplane to monitor data quality across your entire data pipeline. While dbt tests validate the correctness of transformations, Metaplane provides automated, granular monitoring that captures nuances specific to data types and sources.

Metaplane accurately captures each step of the data flow and predicts potential data quality issues before they arise. Whenever a new data source is added, an existing source is removed, or a source is renamed, Metaplane automatically detects these changes and incorporates them into the dbt model without any manual intervention. This integration ensures comprehensive, proactive monitoring, allowing for faster identification and resolution of issues, ultimately maintaining data reliability and consistency across your systems.

Run dbt tests for a reliable data pipeline

dbt tests are a crucial tool to ensure data quality and build trust in your data pipeline. Running dbt tests regularly enables you to proactively address issues, avoid downtime, and ensure that your organization’s data is always accurate and reliable. 

With Metaplane's end-to-end dbt observability, you can monitor data quality in real-time, track the performance of your dbt models, receive and route alerts for any anomalies or failures to the right stakeholder.

Catch potential issues early and resolve them swiftly to ensure consistent performance, and maintain the integrity of your data-driven decision-making process.

‍Try Metaplane today to pair your dbt tests with end to end data observability to ensure data quality across your team.

We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

Your email
Ensure trust in data

Start monitoring your data in minutes.

Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.