Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

3 ways to host and share dbt docs

Are you a dbt Core user who's struggling to find a good way to host and share your dbc docs? Well, this post is for you. Follow these step-by-step instructions to share your dbt documentation via GitHub, Netlify, or AWS—whichever fits better into your workflow.

and
January 28, 2025

Founding Engineer

January 28, 2025
3 ways to host and share dbt docs

dbt docs can be a treasure trove of insights for various stakeholders in your organization. From data engineers trying to understand a table’s lineage to analysts looking up the semantics of a column, dbt documentation serves as a living, breathing source of truth for your data architecture.

Getting these docs into the right hands can be transformative for data democratization and team productivity. It can also be tricky to do.

If you’re a dbt Cloud user, the dbt Cloud Explorer is your one-stop shop for spelunking into the facets of your dbt runs and generated documentation. However, if your team uses dbt Core—or if provisioning a dbt Cloud seat per stakeholder starts to become untenable—you’ll need to explore avenues for hosting the docs yourself.

In this post we’ll explore a few different ways you can go about hosting and sharing your dbt docs.

Generating dbt docs

If you’re not already familiar, the dbt command line interface (CLI) provides a `docs` command that can be used to dynamically generate documentation as a function of the current state of your project. In our specific case, the exact invocation of the command that we’re interested in is this one:

```sql

dbt docs generate --static

```

The important piece is the `--static` flag. While it’s not documented online, running `dbt docs generate --help` sheds some light.

```sql

--static                        Generate an additional static_index.html
                                 with manifest and catalog built-in.

```

Providing the `--static` flag tells `dbt docs generate` to encode the state of your project within the single `static_index.html` file. This makes it possible to host your docs by uploading a single static file—no server required to provide data.

Automating dbt doc generation

To make the hosting process as seamless as possible, we want to automate the process of generating the docs per build. If you’re using dbt Core, the most natural place to generate the docs will be the pipeline that’s running your dbt build. If you’re using dbt Cloud, then go with the CI/orchestration environment that makes the most sense for your team.

Throughout this article, we’re going to assume GitHub Actions.

```sql

- name: build

  run: dbt build

- name: generate docs

  if: github.ref == 'refs/heads/main'

  run: dbt docs generate --static

```

Since `dbt docs generate` runs metadata queries against your database, you’ll need to make sure that any secrets provided to your build command are also passed to the generate command. For brevity, we’re omitting flags/environment variables related to authentication.

Additionally, we’re making the doc generation contingent on the branch. We’re assuming that we want our published docs to reflect the state of our `main` branch.

Three ways to host and share dbt docs

This isn’t a novel problem. All we want to do is host a static HTML file somewhere. That said, there are some subtleties to consider:

  1. We want to be able to automate the hosting process so that we can publish the latest docs after a build.
  2. There may be varying degrees of security concerns that need addressing.

We’re going to explore how to get the job done using three different approaches. Namely:

  • GitHub Pages
  • Netlify, a dedicated app hosting platform
  • AWS S3 (and maybe AWS Lambda)

While this list of approaches is far from exhaustive, it more or less runs the gamut in terms of high-level approaches you can take. The same ideas can be applied to your organization’s preferred infrastructure, e.g., GitLab, Azure, Vercel, etc.

Option 1: GitHub Pages

GitHub Pages allows you to host files from a dedicate git branch at `<github-org>.github.io/<repo-name>`. For example, if you push a file called `blog.html` to the dedicated GitHub Pages branch, you'd be able to access it at `<github-org>.github.io/<repo-name>/blog.html`.

This is the easiest path toward getting your dbt docs hosted—it should only take a few minutes. However, there is a curveball to consider: unless you’re using a GitHub Enterprise Cloud account, GitHub Pages are always public, even if the repo is private. Depending on your team’s security sensitivity, this may be a deal breaker.

Getting started

First, we need to create the dedicated GitHub Pages branch and disassociate it from your main branch. A common convention is to name this branch `gh-pages`. From your GitHub repo, run the following:

```sql

git checkout --orphan gh-pages

git rm -rf .

echo "hello world" > index.html

git add index.html

git commit -m 'initial commit'

git push origin gh-pages

```

This series of commands is creating a new root branch names `gh-pages`, removing all files from it, creating a new `index.html` file, and pushing up to the remote `gh-pages` branch.

At this point, we need to tell GitHub that we want to hose the files in the new `gh-pages` branch. First, navigate to your repo's settings.

Once you’re in the Pages settings, configure the following:

  • Source: deploy from a branch
  • Branch: “gh-pages”
  • Folder: / (root)

Click Save.

It may take a couple of minutes, but eventually, you should be able to navigate to `<github-org>.github.io/<repo-name>` to view the `index.html` we pushed earlier.

We now have a "home" where we can store our generated dbt docs.

Automate publishing to GitHub pages

Now, all we need to do is update our CI pipeline to push the docs to our new `gh-pages` branch. To streamline the juggling of branches, we're going to use an off-the-shelf GitHub Action to push to the `gh-pages`.

```sql

- name: build

  run: dbt build

- name: generate docs

  if: github.ref == 'refs/heads/main'

  run: |

    dbt docs generate --static

    mkdir docs-tmp

    cp target/static_index.html docs-tmp/index.html

- name: publish docs

  if: github.ref == 'refs/heads/main'

  uses: JamesIves/github-pages-deploy-action@v4

  with:

    folder: docs-tmp

    branch: gh-pages

```

Note that we slightly modified the step that generates the docs to make the layout on disk more conducive to the publishing step.

That's it! Now, you can update your README and other documentation to link to `<github-org>.github.io/<repo-name>`, where you'll find the latest state of your dbt docs.

If you’re a GitHub Enterprise Cloud user, you can check out these docs to learn about setting the visibility of your pages to private.

Option 2: Netlify

One approach to hosting and sharing your dbt docs is using a dedicated hosting provider like Netlify. Netlify has options for protecting deployed pages using various authentication options, e.g., basic password auth, Google, Okta, etc. This makes it a great option for more security-conscious teams.

Before getting started, be advised that the Netlify CLI that we’ll use in CI for publishing requires Node 18.14.0 or later. Depending on your CI environment’s base image, you might need to make some changes.

Getting start

First, you’ll need to create a Netlify account. After you’ve done that you’ll create your first Netlify site.

We’re choosing “Deploy manually” here because we’ll be responsible for the deploy process via our CI as opposed to using Netlify’s own CI environment.

At this point, the Netlify UI will prompt you to drag and drop a folder for your first deployment. The content doesn’t really matter at this phase since we’re just trying to create our first site. You can drop any dummy folder with an `index.html` within it. For example:

```sql

mkdir dummy-folder

echo "hello world" > dummy-folder/index.html

open .

```

However, if you’d like to mimic what we’re intending to do, you could just do the following locally:

```sql

dbt docs generate --static

mkdir docs-tmp

cp target/static_index.html docs-tmp/index.html

open .

```

Then, drag and drop the `docs-tmp` directory into the Netlify UI. Netlify will create a new site with the content you dropped. Take note of the randomly generated site name, as we’ll need to reference this later.

Automate publishing to Netlify

Before we can start using the Netlify CLI from CI, we’ll need to authenticate the CI environment with a Netlify auth token.

To create an auth token, navigate to User settings > Application > Personal access tokens.

Click on New access token and follow the prompts. Take note of the generated access token, as you won’t be able to render it again.

Now, we’ll need to make this token accessible from our CI pipeline. Different CI environments have different mechanisms for doing this. Since we’re assuming GitHub Actions, we’re going to use Repository Secrets. Navigate to your repo’s setting, then select Secrets and variables > Actions.

Create a new repository secret. Provide a name and the token value that you generated earlier.

Now, we’re ready to update our CI pipeline to publish our docs to Netlify.

```sql

- name: build

  run: dbt build

- name: generate docs

  if: github.ref == 'refs/heads/main'

  run: |

    dbt docs generate --static

    mkdir docs-tmp

    cp target/static_index.html docs-tmp/index.html

- name: publish docs

   i. : github.ref == 'refs/heads/main'

   run: |

       npm install netlify-cli

       npm exec netlify-cli deploy -- --site YOUR_SITE_NAME_HERE --dir tmp-docs

   env:

       NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}

```

A few things to note:

  • We've updated the generate docs step to rename and move the doc file into a separate directory. This is just to make it easier to provide `docs-tmp` as input to the Netlify CLI.
  • We're installing the Netlify CLI and immediately invoking it, using the assigned site name from earlier along with the `tmp-docs` directory.
  • We're providing an environment variable named `NETLIFY_AUTH_ TOKEN` that maps to the secret we stored earlier.

That's it! Now, every `main` dbt build will publish docs to `<site-name>.netlify.app`. Check out the Netlify docs if you want to learn more about how to protect your site.

Option 3: AWS

Storage providers like AWS S3, Google Cloud Storage, Cloudflare R2, and Azure Blob Storage are some of the more obvious places we can use to host static files. However, depending on your user permissions on these platforms and the expected security around your dbt docs, you might be several JIRA tickets away from getting something off the ground.

Nevertheless, we’ll outline how you can provision an S3 bucket for hosting your dbt docs, along with some options for implementing secure access to the bucket.

For this method, we’re going to assume that you have the AWS CLI installed on your machine.

If you’re on macOS and use Homebrew, run the following command:

```sql

brew update && brew install awscli

```

Otherwise, follow the directions here.

Publish to a publicly accessible bucket

If you’re not concerned about unauthorized access to your dbt docs, then creating a publicly accessible S3 bucket is a simple way to go.

First, we’ll create the bucket.

```sql

aws s3api create-bucket --bucket dbt-docs --region us-east-1

```

Now, we’ll configure the bucket to be publicly accessible.

```sql

aws s3api put-bucket-policy --bucket dbt-docs --policy '{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "PublicReadGetObject",

            "Effect": "Allow",

            "Principal": "*",

            "Action": "s3:GetObject",

            "Resource": "arn:aws:s3:::dbt-docs/*"

        }

    ]

}'

aws s3api put-public-access-block --bucket YOUR-BUCKET-NAME --public-access-block-configuration "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"

```

Now, we'll provision a dedicated user for writing to this bucket. This is the user that we’ll use to publish docs from the CI pipeline.

```sql

aws iam create-user --user-name dbt-docs-writer

aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Action": [

                "s3:PutObject",

                "s3:ListBucket"

            ],

            "Resource": [

                "arn:aws:s3:::dbt-docs/*",

                "arn:aws:s3:::dbt-docs"

            ]

        }

    ]

}'

aws iam create-access-key --user-name dbt-docs-writer

```

Take note of the generated tokens, as we’ll need them later.

With the S3 bucket in place, you can skip down to the Automated publishing section of this article.

Publish to a VPC-accessible bucket

If your company VPNs into a VPC for day-to-day operations, you can leverage that by creating a bucket that’s only accessible via a VPC endpoint.

This article isn’t the best place to regurgitate all of the thorough documentation on this subject. As this starts to get heavily into dev ops territory, it’d be wise to check if your organization already has a process in place for restricting bucket access based on network constraints.

Assuming that you’re able to lock down a bucket for reading properly, you’ll still need to provision a user that can write to the bucket from your CI process.

```sql

aws iam create-user --user-name dbt-docs-writer

aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Action": [
           "s3:PutObject",
             "s3:ListBucket"
           ],
           "Resource": [

                "arn:aws:s3:::dbt-docs/*",

                "arn:aws:s3:::dbt-docs"

            ]

        }

    ]

}'

aws iam create-access-key --user-name dbt-docs-writer

```

Take note of the generated tokens, as we’ll need them later.

Publish to a lambda-accessible bucket

One way to lock down your S3 bucket is by making it accessible only via an authenticated proxy service. We can do just that by deploying a lambda and registering it as an OpenID Connect (OIDC) app within your identity provider, e.g., Okta, Azure, OneLogin, etc.

When a user tries to access a file via this lambda’s URL, a tiny bit of code will run to first authenticate with your IdP. If all is well, the lambda will read the file from S3 using read-only credentials and return the content to the end user.

Treat this method as a proof-of-concept to inspire similar approaches and discuss related Cloud platform/IdP changes with your DevOps and IT teams, as they may already have solutions or conventions for these types of problems.

Note: these steps depend on you having Node and npm installed on your local machine.

Publish to a private bucket

We’re going to create an S3 bucket that’s only readable by our lambda.

First, create a role for the lambda.

```sql

aws iam create-role \

    --role-name dbt-docs-lambda \

    --assume-role-policy-document '{

        "Version": "2012-10-17",

        "Statement": [{

            "Effect": "Allow",

            "Principal": {

                "Service": "lambda.amazonaws.com"

            },

            "Action": "sts:AssumeRole"

        }]

    }'

```

While we’re here, we also need to make sure that this whole has the permissions required to write to CloudWatch logs.

```sql

aws iam attach-role-policy \

    --role-name dbt-docs-lambda \

    --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

```

Now, we’ll create the bucket with a policy that allows the lambda role to read from it.

```sql

aws s3api create-bucket \

    --bucket dbt-docs \

    --region us-east-1

aws s3api put-public-access-block \

    --bucket dbt-docs \

    --public-access-block-configuration \

    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

aws s3api put-bucket-policy \

    --bucket dbt-docs \

    --policy "{

    \"Version\": \"2012-10-17\",

    \"Statement\": [

        {

            \"Sid\": \"LambdaAccess\",

            \"Effect\": \"Allow\",

            \"Principal\": {

                \"AWS\": \"$(aws iam get-role --role-name dbt-docs-lambda --query 'Role.Arn' --output text)\"

            },=

            \"Action\": [

                \"s3:GetObject\",

                \"s3:ListBucket\"

            ],

            \"Resource\": [

                \"arn:aws:s3:::dbt-docs\",

                \"arn:aws:s3:::dbt-docs/*\"

            ]

        }

    ]

}"

```

Creating the lambda

Start by cloning the example project found here. After you’ve cloned it and `cd`'d into it, install your project's dependencies.

```sql

npm install

```

Open the `index.js` file and comment out the following lines:

```sql

app.use(

  auth({

    authorizationParams: {

      response_type: "id_token",

    },

  })

);

```

We need to comment these lines out initially because there is a bit of a chicken-and-egg problem: we need a lambda URL and OIDC client credentials in order for that code to run successfully, but we can’t obtain those until we’ve deployed the lambda for the first time.

Now, we’ll create the initial lambda.

```sql

aws lambda create-function \

    --function-name dbt-docs \

    --runtime nodejs22.x \

    --zip-file fileb://function.zip \

    --handler index.handler \

    --role $(aws iam get-role --role-name dbt-docs-lambda --query 'Role.Arn' --output text)

```

Create a URL for it and log it out.

```sql

aws lambda create-function-url-config \

    --function-name dbt-docs \

    --auth-type NONE

aws lambda get-function-url-config --function-name dbt-docs

```

At this point, if all went well, you should be able to navigate to your lambda’s assigned URL using your browser and see “hello world.”

Create an OIDC application

The lambda we just deployed needs to be able to communicate with your IdP. Now, we need to log in to your IdP and register a new OIDC application that references the lambda. We’re going to assume Okta for this step, but the same steps apply to all OIDC-supporting IdPs.

Once in Okta, navigate to Application and click Create Application Integration.

Select OIDC as the Sign-in method and Web application as the Application type then continue.

On the next screen, you’ll need to do the following:

After the OIDC app has been created, you should have access to a client ID and a client secret. Store these for later.

Update the lambda

Now that we have a lambda URL and OIDC application, we can configure the lambda to authenticate requests.

First, uncomment the code that we commented out earlier.

```sql

app.use(

  auth({

    authorizationParams: {

      response_type: "id_token",

    },

  })

);

```

Now, create a new `.env` file in the root of the project directory and paste the following into it.

```sql

ISSUER_BASE_URL=https://your-okta-domain.okta.com

CLIENT_ID=your-oidc-app-client-id

CLIENT_SECRET=your-oidc-app-client-secret

BASE_URL=https://your-lambda-url.lambda-url.us-east-1.on.aws

SECRET=LONG_RANDOM_VALUE

S3_BUCKET=your-s3-bucket-name

REGION=us-east-1

```

Most of these should be self-explanatory, with the exception of SECRET. From the underlying library’s docs:

"The secret(s) used to derive an encryption key for the user identity in a stateless session cookie."

Just make sure this is a long random string.

The `package.json` in the project comes with some convenience scripts to make updating the lambda’s code and environment variables more ergonomic.

Since we updated our code, we’ll need to redeploy.

```sql

npm run deploy

```

We’ll need to wait a few seconds for the update to complete before setting the environment variables using:

```sql

npm run update-function-configuration

```

If you run the second command too soon, you may see an error like this:

```sql

An error occurred (ResourceConflictException) when calling the UpdateFunctionConfiguration operation: The operation cannot be performed at this time. An update is in progress for resource

```

Just try again a few seconds later.

Create a dbt doc writer user

We’re going to need a IAM user and associated credentials in order to write to the S3 bucket from our CI pipeline.

```sql

aws iam create-user --user-name dbt-docs-writer

aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Action": [

                "s3:PutObject",

                "s3:ListBucket"

            ],

            "Resource": [

                "arn:aws:s3:::dbt-docs/*",

                "arn:aws:s3:::dbt-docs"

            ]

        }

    ]

}'

aws iam create-access-key --user-name dbt-docs-writer

```

Keep note of these credentials as we’re going to need them later when updating the CI pipeline.

At this point, all we need to do is automate the publishing from CI. Once you’ve completed the steps below, you’ll be able to access the published docs by navigating to the lambda’s URL while appending the name of the file, e.g., `index.html`.

Automate dbt doc publishing to S3

Before we can start using the AWS CLI from CI, we’ll need to authenticate the CI environment with AWS IAM tokens. Different CI environments have different mechanisms for doing this. Since we’re assuming GitHub Actions, we’re going to use Repository Secrets. Navigate to your repo’s setting, then Secrets and variables > Actions.

Now, we'll need to create two new repository secrets: `AWS_ACCESS_ KEY_ ID` and `AWS_SECRET_ACCESS_KEY`. Provide the respective value to each.

GitHub Actions environments already come with the `aws` CLI. If you’re not using GitHub Actions you’ll need to explore options for making it available, e.g., installing it in an earlier step or updating your container’s image.

Now, we’re ready to update our CI pipeline to publish our docs to S3.

```sql

- name: build

  run: dbt build

- name: generate docs

  if: github.ref == 'refs/heads/main'

  run: dbt docs generate --static

- name: publish docs

  if: github.ref == 'refs/heads/main'

  run: aws s3 cp target/static_index.html s3://dbt-docs/index.html

  env:

    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}

    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

    AWS_DEFAULT_REGION: us-east-1

```

Finally, we’re finished. Any dbt builds that occur on your main branch will now publish the generated docs to S3.

Share dbt docs with your entire team

If dbt is an essential part of your data pipeline, proper documentation is essential. For dbt Core users, this can be tricky, but using any of the three methods outlined above, you can make documentation a seamless part of your dbt workflow.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.