Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

dbt macros: What they are and why you should use them

Learn how to use dbt macros to write smarter, reusable SQL that's easy to maintain.

and
January 27, 2025

Technical Educator

January 27, 2025
dbt macros: What they are and why you should use them

As your data engineering team scales, one of the most important tasks is to maintain consistent and efficient data transformations. But this becomes increasingly challenging as the volume of data and data sources increases. As the number of data pipelines increases, it becomes harder to maintain alignment across multiple stages of extraction, transformation, and loading. dbt macros enable you to create reusable code blocks that promote consistency reduce redundancy and accelerate your development process. 

These macros, which are essentially reusable Jinja code snippets, help your team adhere to the DRY (don’t repeat yourself) principle and execute transformations efficiently. Additionally, dbt macros allow for better collaboration among team members by centralizing and standardizing transformation logic. 

With macros, your team can ensure that everyone follows the same process, reducing errors and discrepancies in the data pipeline. This approach speeds up the development process, as engineers can quickly reuse well-tested macros rather than recreating logic from scratch each time.

If you use dbt regularly but haven’t yet used macros or if you’re looking for ways to optimize and streamline your data transformation processes, this article will walk you through what dbt macros are, where they’re helpful, and how to write macros. 

Let’s dive in.

What are dbt macros?

dbt macros are reusable blocks of SQL or Jinja logic that help simplify and standardize your data transformations across multiple models. By using macros, you centralize and reuse complex SQL code, ensuring consistency across your dbt models.

Built on Jinja, a powerful templating language for Python, dbt macros support Jinja features like loops, conditions, and variables to create flexible, dynamic SQL. For example, the following simple dbt macro uses Jinja to dynamically generate a SELECT statement based on an input value:

```sql

{% macro example_macro(input_value) %}

  select {{ input_value }}

{% endmacro %}

```

In this example, the macro takes an input value and generates a SELECT statement based on that input. The main advantage of dbt macros comes from Jinja’s ability to generate dynamic SQL, enabling you to write flexible, reusable code that adapts to different scenarios and datasets without repetition.

How macros differ from models

Macros provide reusable logic for SQL and act like functions in programming, enabling dynamic code generation and reducing redundancy, which are written in Jinja and stored in the macros directory. Models represent tables or views in the database and define how data should be transformed. They consist of plain SQL queries and are stored in the models directory. 

Macros are not materialized and are used within models or tests to generate SQL dynamically, models are materialized as tables or views in the database, directly transforming and storing data.

Aspect Macros Models
Purpose Reusable logic for SQL Tables or views in the database
Syntax Jinja-based SQL queries
Location macros directory models directory
Execution Used within models or tests Transforms data directly

Basic macro structure and syntax

Let’s take a look at this basic macro:

```sql

{% macro calculate_percentage(numerator, denominator) %}

    case

        when {{ denominator }} = 0 then null

        else {{ numerator }} * 100.0 / {{ denominator }}

    end

{% endmacro %}

```

Now, let’s break down the basic macro structure.

The macro starts with `{% macro <macro_name>(parameters) %}` and ends with `{% endmacro %}`. In this case, the macro is named `calculate_percentage` and takes two parameters, numerator and denominator, and generates a SQL `CASE` statement to calculate a percentage.

The macro contains SQL logic written dynamically using Jinja placeholders `({{ }})`.These placeholders allow you to inject the values of the parameters (`numerator` and `denominator`) into the SQL code. Macros do not execute independently. They are invoked within a model or another macro to dynamically generate SQL code.

Where do macros live in your dbt project structure?

Macros live in the `macros/` directory of your dbt project, which is specifically designated for reusable logic. dbt also provides a file called macros.yml where you can document or define any configurations related to macros. To organize macros, you group them into separate files or folders (e.g., `macros/utils/`).

Where dbt macros are helpful

dbt macros are incredibly helpful in several key areas of your data transformation workflow.  By turning complex SQL patterns into reusable blocks, macros enable you to focus on building efficient data pipelines rather than writing boilerplate code repeatedly. These are the key areas of a data transformation workflow where dbt macros are extremely helpful:

SQL transformations

dbt macros allow you to write reusable, customizable logic that can be applied across multiple models or transformations. This reduces redundancy and ensures consistency throughout the project. Here are some specific SQL transformations where you can use dbt macros.

Standardizing column names

 One of the fastest ways to frustrate a data engineer is with inconsistent column names. Imagine trying to JOIN tables or run queries when the same column is labeled Product_ID, product_id, or Prod_ID? Inconsistent column names can lead to confusion, errors, and inefficiencies.

 While you might not create tables with inconsistent column names, when your data comes from multiple sources, you may deal with these discrepancies. You can use a macro to standardize column names like converting them to lowercase and replacing spaces with underscores.

```sql

{% macro standardize_column_name(column_name) %}

lower(replace({{ column_name }}, ' ', '_'))

{% endmacro %}

```

Apply this macro in transformations such as `orders_and_sales_summary` and `product_performance_summary` to ensure that all column names follow a consistent naming convention.

Converting data types

Converting data types is an essential step to ensure consistency in your data pipeline. When working with data, it's common to encounter columns that may need to be standardized to a specific data type to ensure that calculations, joins, or aggregations are performed correctly.

For instance, if a column like `LINEITEM_QUANTITY` in the `orders_snapshot` table is stored as a string or floating-point number, it may cause issues when performing mathematical operations or joins. You can use a macro to consistently convert these values into the correct data type, in this case, an integer. The macro would look like this:

```sql

{% macro cast_to_integer(column) %}

cast({{ column }} as integer)

{% endmacro %}

```

This macro defines a reusable function that takes a column name as an argument and casts it to an integer. Now, you can ensure that all relevant numeric columns in your transformations are consistently converted to integers, which will improve your data accuracy and reliability of your data pipeline.

Handling NULL values

NULL values can cause errors or lead to incorrect results when performing calculations, aggregations, or joins. A common approach to handle NULLs is to replace them with a default value, which ensures that your data remains consistent and usable throughout the transformation process.

```sql

{% macro replace_null(column, default) %}

coalesce({{ column }}, {{ default }})

{% endmacro %}

```

This macro uses the `COALESCE` function to check if a column contains a NULL value. If the column is NULL, it replaces it with the specified default value.

Cross-database compatibility

Cross-database compatibility is essential when working with multiple databases or data warehouses, as platforms like Snowflake, BigQuery, and others can have slight variations in SQL syntax and supported functions.

Macros ensure your code works across these databases by making your SQL code database-agnostic. By defining reusable macros for common functions or syntax, you can avoid rewriting the same logic for each database.

For example, if you have data stored across multiple data warehouses, like Snowflake and BigQuery, and you need to write a transformation to capture the current timestamp, you can use a macro to generate the timestamp in a consistent, database-agnostic way, rather than writing different SQL queries for each platform.

For example:

```sql

{% macro current_timestamp() %}

{{ return(adapter.dispatch('current_timestamp')()) }}

{% endmacro %}

```

This macro uses the `adapter.dispatch` configuration, allowing you to adapt the `current_timestamp` function for different databases. The `dispatch` configuration checks the database you’re working with and generates the appropriate SQL syntax for that database. 

For models like `orders_and_sales_summary`, you may need to use the current timestamp to track when data was last updated or to filter data by the most recent records. Using this macro ensures that your SQL works consistently across different database platforms without having to rewrite timestamp logic for each one.

Additionally, if you work with multiple databases or data warehouses, certain SQL functions may have different syntaxes or behave slightly differently depending on the database or warehouse. To handle this, you can define reusable macros for specific functions like date differences, ensuring your code remains consistent across databases without worrying about platform-specific syntax.

Suppose you are working with a dataset like `customer_order_history`, where you need to calculate the difference between a customer's `first_order_date` and `last_order_date` to understand how long it took for them to make their most recent purchase.

Without a macro, you might have to write different `DATEDIFF` functions for Snowflake, BigQuery, or Redshift, as each might have a slightly different syntax for calculating date differences. Instead, by using the `date_diff` macro, you can abstract away the database-specific differences and apply the same calculation in a database-agnostic manner.

For example, in the `customer_order_history model`, you can write:

```sql

{% macro date_diff(start_date, end_date) %}

datediff({{ start_date }}, {{ end_date }})

{% endmacro %}

```

This macro will handle the date difference calculation, and the `adapter.dispatch` configuration will ensure the correct SQL syntax for the underlying database, whether it's Snowflake, BigQuery, or another data warehouse. This approach makes your transformations cleaner, more maintainable, and faster.

Testing and validation

You can use macros to test and automate the validation of your data and catch potential issues early. This helps maintain the integrity of your data pipeline, ensuring that your transformations are accurate and reliable across different models and use cases.

Custom test creation

Custom tests are specific checks you define for your data models. They go beyond the default tests (like NULL checks) and enable you to enforce business-specific rules that ensure the correctness and consistency of the data in your models.

For instance, you might want to ensure that all orders have a non-negative total, that product prices are never zero, or that a certain date field always contains valid values. For example, you can create a macro for a custom test to check if order_total is always positive.

```sql

{% macro test_positive_value(model, column) %}

select * from {{ model }} where {{ column }} < 0

{% endmacro %}

```

How this works:

  • `model`: This is the name of the model (for example, `orders_and_sales_summary`) where you want to apply the test.
  • `column`: This is the specific column in the model (in this case, `order_total`) that you want to check.
  • The query: The macro runs a simple SQL query that selects all rows from the specified model where the value of the specified column (`order_total`) is less than zero. If there are any such rows, it will return them, indicating that there’s an issue with the data.

Data quality checks

In addition to validating that values are within the expected range (like ensuring `order_total` is positive), macros can be used for other data quality checks such as:

Checking for duplicates: A macro can check for duplicate records based on certain columns. For example, if you want to check for duplicate `order_id`s in the `orders_and_sales_summary` model:

```sql

{% macro check_duplicates(model, column) %} 

select {{ column }}, count(*) 

from {{ model }} 

group by {{ column }} having count(*) > 1 

{% endmacro %}

```

This macro would identify and return any `order_id`s that appear more than once, and you can remove the duplicate to ensure data consistency.

Checking for missing relationships: You also might want to ensure that every order in `orders_and_sales_summary` is linked to a valid customer. You can create a macro to check for missing relationships, that looks like this:

```sql

{% macro check_missing_relationship(model, foreign_key, reference_model, reference_column) %}

select * from {{ model }} 

where {{ foreign_key }} not in (select {{ reference_column }} from {{ reference_model }})

{% endmacro %}

```

This macro checks if there are any `foreign_key` values (like `customer_id`) in the `model` (e.g., `orders_and_sales_summary`) that do not exist in the `reference_model` (e.g., `customers`) under the `reference_column` (e.g., `customer_id`). If any records are missing the relationship, this will return those rows for investigation.

Project maintenance

Macros play a significant role in project maintenance tasks, such as generating documentation, managing dependencies, and standardizing table structures. You can write macros to automate these tasks, improving data quality, maintaining better organization, and fostering a strong data culture within your team.

Generating documentation

Clear documentation is crucial to ensuring that your data models are understandable and usable by the team, but writing detailed descriptions for every model and column can be time-consuming. Macros automate this process by pulling metadata or predefined information to generate documentation dynamically.

Imagine you have a model, `orders_export`, and you want to auto-generate a description for it. A macro could pull metadata, such as the source of the data or its purpose, and embed it into your documentation files.

This macro dynamically generates descriptions for your models based on parameters like the model name, source table, and purpose.

```sql

{% macro generate_description(model_name, source_table, purpose) %}

"This model, {{ model_name }}, aggregates data from {{ source_table }}. It is used for {{ purpose }}."

{% endmacro %}

```

Managing dependencies

Managing dependencies between models and external tables can become challenging when your transformations are complex, especially in large-scale projects with multiple interconnected datasets. 

Each model may rely on multiple upstream tables, and even a small change in one source can disrupt the entire pipeline, potentially causing failures or inconsistencies. Macros programmatically handle these dependencies with dynamic logic. 

Macros can check for table existence, identify outdated sources, or resolve dependencies at runtime. This ensures your pipelines are easy to maintain and reduces the risk of errors.

For instance, this macro ensures that your downstream models don’t fail due to missing dependencies, which maintains the integrity of your data pipelines.

```sql

{% if check_table_exists('raw_orders') %}

    with orders as (

        select * from raw_orders

    ),

    cleaned_orders as (

        select

            order_id,

            customer_id,

            total_amount

        from orders

    )

    select * from cleaned_orders

-- Create an empty table if raw_orders doesn't exist

{% else %}

    create table {{ target.schema }}.orders_summary as (

        select

            null as order_id,

            null as customer_id,

            null as total_amount

        where 1=0

    )

{% endif %}

```

Standardizing table structures

Inconsistent table structures can lead to issues such as mismatched schemas, unexpected errors in downstream processes, or data discrepancies. To address this, you can use macros to standardize table structures across models, which will ensure uniformity in column selection and formatting.

Here’s how this macro works:

```sql

{% macro enforce_schema(columns) %}

select {{ ", ".join(columns) }}

{% endmacro %}

```

The macro takes a list of column names as its input. This list represents the exact schema you want your table to follow, you can decide which columns should appear and in what order.

Once the list is passed to the macro, it programmatically generates a `SELECT` statement that only includes the specified columns, in the order you define. This ensures that the resulting table adheres to a consistent schema, no matter how the source data might vary (even if it includes additional columns or has a different column order).

This macro guarantees that every table or model you create using this macro has the same structure as defined by the columns list, making your data pipelines more reliable and easier to maintain.

How to write a macro: a step-by-step walkthrough

When dealing with data from multiple sources, column names often vary in format. For example, you might encounter:

  • `Customer Name` in one table
  • `customer_name` in another
  • `CUSTOMERNAME` elsewhere

These variations make queries harder to write and can lead to errors. For instance, you may accidentally forget to apply the same transformation to every column, resulting in mismatched or inconsistent data across models. This inconsistency can cause problems when joining tables or running aggregations, as the database won’t recognize the columns as equivalent.

A macro can enable repeated column standardization across models.

Here’s the macro to standardize column names by:

  • Converting them to lowercase
  • Trimming whitespace
  • Replacing spaces with underscores

Here’s the macro:

```sql

{% macro standardize_column(column) %}

replace(lower(trim({{ column }})), ' ', '_')

{% endmacro %}

```

`trim({{ column }})`: Removes any leading or trailing spaces from the column name.

`lower()`: Converts the column name to lowercase for consistency.

`replace()`: Replaces any spaces in the column name with underscores to match a standard format.

Here’s how you can use this macro in a transformation. Suppose you have an `orders_snapshot` table with a column named `customer_name` that needs standardization.

```sql

select

  {{ standardize_column('customer_name') }} as standardized_name

from orders_snapshot

```

Now that you’ve written the macro, you need to test it to ensure it functions properly.

First, test the macro on a single column name:

```sql

select

  {{ standardize_column('customer_name') }} as standardized_name

from orders_snapshot

```

The expected result should be 'customer_name'.

Next, apply the macro in a test model with mock data:

```sql

with orders_snapshot as (

    select 'John Doe' as customer_name

)

select

    {{ standardize_column('customer_name') }} as standardized_name

from orders_snapshot

```

The expected result should be john_doe.

Best practices for macro development

We've walked through different use cases for macros and how to write them, but let's end by covering some best practices to keep in mind when using macros across your transformations.

Naming conventions

Use clear and descriptive names that reflect their purpose and functionality. For example, a macro called `standardize_column_name` immediately describes its purpose: to format column names in a consistent manner. 

Clear and descriptive names ensure that team members can quickly understand what each macro does, improving collaboration and reducing confusion in the codebase. Avoid confusing or overly abbreviated names.

Documentation

Always document your macros to make them easier to understand and use. This can be achieved through in-line comments that explain the macro's functionality and parameters. For instance:

```sql

{% macro cast_to_integer(column) %}

-- Converts a column to an integer type

cast({{ column }} as integer)

{% endmacro %}

```

This example includes a brief comment which states the macro's purpose. Documentation makes it easier for others (or even you in the future self) to quickly grasp its purpose and intent.. Proper documentation ensures that macros are easy to adopt across the team, reducing onboarding time and errors.

Testing

Thorough testing is essential to ensure your macros behave as expected. Test macros by creating mock models with representative data and validating their output. For example, if you’re using a macro to standardize column names, verify that the transformed names match your expectations. Consistent testing helps you identify issues early and ensures that the macro works consistently across models.

Nested macros

Nested macros involve using one macro within another to build more complex and reusable logic. For example, a base macro, `standardize_column`, transforms individual column names by converting them to lowercase and replacing spaces with underscores.

```sql

{% macro standardize_column(column) %}

replace(lower(trim({{ column }})), ' ', '_')

{% endmacro %}

```

Another macro, `standardize_columns`, goes through a list of column names and applies the `standardize_column` logic to each one.

```sql

{% macro standardize_columns(columns) %}

{%- set standardized_columns = [] -%}

{%- for col in columns -%}

    {%- set _ = standardized_columns.append(standardize_column(col)) -%}

{%- endfor -%}

{{ return(standardized_columns) }}

{% endmacro %}

```

By nesting the `standardize_column` macro within `standardize_columns`, you streamline the process of transforming multiple columns, and ensuring data consistency but reducing duplication.

The base macro handles the transformation of individual column names and keeps the logic reusable for single-column transformations. The nested macro uses the base macro (`standardize_column`) within a loop and transforms a list of column names programmatically. The macro returns a list of standardized column names, ensuring consistency across multiple columns.

Using control structures in macros

Jinja control structures like `if` statements, `for` loops, and `try/except` blocks provide the flexibility to handle conditional logic, iteration, and error handling when writing macros. Here's a quick breakdown of their use:

`if` statements are used to add conditional logic within your macro. This is particularly helpful when you need to verify conditions before applying transformations. For example, you need to convert a column to an integer, but only if it exists:

```sql

{% macro cast_column_to_integer(column) %}

    {% if column is not none %}

        cast({{ column }} as integer)

    {% else %}

        null

    {% endif %}

{% endmacro %}

```

This macro will convert a column to an integer only if it exists, but if it doesn't exist, it will insert `null` instead of failing. 

`For` loops allow you to iterate through lists, making it easy to apply transformations to multiple columns. 

For example, if you need to standardize multiple column names in a transformation, you can write this macro:

```sql

{% macro standardize_multiple_columns(columns) %}

    {%- for col in columns -%}

        {{ standardize_column(col) }}

    {%- endfor -%}

{% endmacro %}

```

This macro will apply the `standardize_column` transformation to each and ensure uniform formatting across each column.

Try/except blocks

`try/except` blocks handle potential errors or edge cases (situations that occur outside the normal operating conditions) when macros are invoked. However, you can’t use `try/except` with dbt macros as Jinja doesn’t support `try/except` blocks. Instead, you can use `execute` and the `run_query` function. 

For example, you can execute a query and check if it succeeded or failed using the `execute` variable. If the query fails, you can define a fallback logic to manage the error.

Here’s an example of how you can implement this:

```sql

{% macro example_macro() %}

  {% for item in my_list_of_dicts %}

    {% set delete_query %}

      -- some SQL logic

    {% endset %}

    {% set result = run_query(delete_query) %}

    {% if not execute %}

      -- If the query execution fails, handle the error

      SELECT 'Error executing delete query for {{ item.some_identifier }}' AS errormsg

    {% endif %}

  {% endfor %}

{% endmacro %}

```

`run_query` attempts to execute the SQL, and if the query fails (which is simulated by checking `execute`), it returns a custom error message instead of letting the failure to run. This is similar to a `try/except` block in Python, where the "try" is represented by `run_query`, and the "except" is handled by the conditional logic checking the execution result.

While dbt doesn't support `try/except` blocks, you can use Metaplane's dbt alerting tool to notify you of issues or failures that arise during model execution.

dbt macro FAQs

How do macros differ from packages?

Macros are reusable SQL logic, while packages are prebuilt dbt project extensions.

Can macros access source data?

No, macros generate SQL code; they don't directly interact with data.

How to handle macro dependencies?

use the `ref()` function to define dependencies between models and ensure macros are executed in the correct order. 

Performance impact considerations

Avoid overusing nested macros to prevent complex, slow SQL.

Pair your dbt macros with Metaplane’s end to end observability

Pair your dbt macros with Metaplane’s dbt end-to-end observability to ensure your data pipeline's efficiency and reliability. By incorporating macros in your workflow, you enhance both the efficiency and the consistency of your transformations, making it easier to scale your data operations while maintaining data quality. 

With the ability to centralize complex SQL and Jinja code, macros improve collaboration and reduce errors, ensuring that your data transformations remain aligned as the volume of data and sources increases.

With Metaplane you can monitor your data transformations in real time, track the performance of your dbt models, and ensure more robust data quality. Catch issues early, track performance, and resolve any anomalies swiftly to maintain the integrity of your data pipeline.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.