3 ways to host and share dbt docs
Are you a dbt Core user who's struggling to find a good way to host and share your dbc docs? Well, this post is for you. Follow these step-by-step instructions to share your dbt documentation via GitHub, Netlify, or AWS—whichever fits better into your workflow.
dbt docs can be a treasure trove of insights for various stakeholders in your organization. From data engineers trying to understand a table’s lineage to analysts looking up the semantics of a column, dbt documentation serves as a living, breathing source of truth for your data architecture.
Getting these docs into the right hands can be transformative for data democratization and team productivity. It can also be tricky to do.
If you’re a dbt Cloud user, the dbt Cloud Explorer is your one-stop shop for spelunking into the facets of your dbt runs and generated documentation. However, if your team uses dbt Core—or if provisioning a dbt Cloud seat per stakeholder starts to become untenable—you’ll need to explore avenues for hosting the docs yourself.
In this post we’ll explore a few different ways you can go about hosting and sharing your dbt docs.
Generating dbt docs
If you’re not already familiar, the dbt command line interface (CLI) provides a `docs` command that can be used to dynamically generate documentation as a function of the current state of your project. In our specific case, the exact invocation of the command that we’re interested in is this one:
```sql
dbt docs generate --static
```
The important piece is the `--static` flag. While it’s not documented online, running `dbt docs generate --help` sheds some light.
```sql
--static Generate an additional static_index.html
with manifest and catalog built-in.
```
Providing the `--static` flag tells `dbt docs generate` to encode the state of your project within the single `static_index.html` file. This makes it possible to host your docs by uploading a single static file—no server required to provide data.
Automating dbt doc generation
To make the hosting process as seamless as possible, we want to automate the process of generating the docs per build. If you’re using dbt Core, the most natural place to generate the docs will be the pipeline that’s running your dbt build. If you’re using dbt Cloud, then go with the CI/orchestration environment that makes the most sense for your team.
Throughout this article, we’re going to assume GitHub Actions.
```sql
- name: build
run: dbt build
- name: generate docs
if: github.ref == 'refs/heads/main'
run: dbt docs generate --static
```
Since `dbt docs generate` runs metadata queries against your database, you’ll need to make sure that any secrets provided to your build command are also passed to the generate command. For brevity, we’re omitting flags/environment variables related to authentication.
Additionally, we’re making the doc generation contingent on the branch. We’re assuming that we want our published docs to reflect the state of our `main` branch.
Three ways to host and share dbt docs
This isn’t a novel problem. All we want to do is host a static HTML file somewhere. That said, there are some subtleties to consider:
- We want to be able to automate the hosting process so that we can publish the latest docs after a build.
- There may be varying degrees of security concerns that need addressing.
We’re going to explore how to get the job done using three different approaches. Namely:
- GitHub Pages
- Netlify, a dedicated app hosting platform
- AWS S3 (and maybe AWS Lambda)
While this list of approaches is far from exhaustive, it more or less runs the gamut in terms of high-level approaches you can take. The same ideas can be applied to your organization’s preferred infrastructure, e.g., GitLab, Azure, Vercel, etc.
Option 1: GitHub Pages
GitHub Pages allows you to host files from a dedicate git branch at `<github-org>.github.io/<repo-name>`. For example, if you push a file called `blog.html` to the dedicated GitHub Pages branch, you'd be able to access it at `<github-org>.github.io/<repo-name>/blog.html`.
This is the easiest path toward getting your dbt docs hosted—it should only take a few minutes. However, there is a curveball to consider: unless you’re using a GitHub Enterprise Cloud account, GitHub Pages are always public, even if the repo is private. Depending on your team’s security sensitivity, this may be a deal breaker.
Getting started
First, we need to create the dedicated GitHub Pages branch and disassociate it from your main branch. A common convention is to name this branch `gh-pages`. From your GitHub repo, run the following:
```sql
git checkout --orphan gh-pages
git rm -rf .
echo "hello world" > index.html
git add index.html
git commit -m 'initial commit'
git push origin gh-pages
```
This series of commands is creating a new root branch names `gh-pages`, removing all files from it, creating a new `index.html` file, and pushing up to the remote `gh-pages` branch.
At this point, we need to tell GitHub that we want to hose the files in the new `gh-pages` branch. First, navigate to your repo's settings.
Once you’re in the Pages settings, configure the following:
- Source: deploy from a branch
- Branch: “gh-pages”
- Folder: / (root)
Click Save.
It may take a couple of minutes, but eventually, you should be able to navigate to `<github-org>.github.io/<repo-name>` to view the `index.html` we pushed earlier.
We now have a "home" where we can store our generated dbt docs.
Automate publishing to GitHub pages
Now, all we need to do is update our CI pipeline to push the docs to our new `gh-pages` branch. To streamline the juggling of branches, we're going to use an off-the-shelf GitHub Action to push to the `gh-pages`.
```sql
- name: build
run: dbt build
- name: generate docs
if: github.ref == 'refs/heads/main'
run: |
dbt docs generate --static
mkdir docs-tmp
cp target/static_index.html docs-tmp/index.html
- name: publish docs
if: github.ref == 'refs/heads/main'
uses: JamesIves/github-pages-deploy-action@v4
with:
folder: docs-tmp
branch: gh-pages
```
Note that we slightly modified the step that generates the docs to make the layout on disk more conducive to the publishing step.
That's it! Now, you can update your README and other documentation to link to `<github-org>.github.io/<repo-name>`, where you'll find the latest state of your dbt docs.
If you’re a GitHub Enterprise Cloud user, you can check out these docs to learn about setting the visibility of your pages to private.
Option 2: Netlify
One approach to hosting and sharing your dbt docs is using a dedicated hosting provider like Netlify. Netlify has options for protecting deployed pages using various authentication options, e.g., basic password auth, Google, Okta, etc. This makes it a great option for more security-conscious teams.
Before getting started, be advised that the Netlify CLI that we’ll use in CI for publishing requires Node 18.14.0 or later. Depending on your CI environment’s base image, you might need to make some changes.
Getting start
First, you’ll need to create a Netlify account. After you’ve done that you’ll create your first Netlify site.
We’re choosing “Deploy manually” here because we’ll be responsible for the deploy process via our CI as opposed to using Netlify’s own CI environment.
At this point, the Netlify UI will prompt you to drag and drop a folder for your first deployment. The content doesn’t really matter at this phase since we’re just trying to create our first site. You can drop any dummy folder with an `index.html` within it. For example:
```sql
mkdir dummy-folder
echo "hello world" > dummy-folder/index.html
open .
```
However, if you’d like to mimic what we’re intending to do, you could just do the following locally:
```sql
dbt docs generate --static
mkdir docs-tmp
cp target/static_index.html docs-tmp/index.html
open .
```
Then, drag and drop the `docs-tmp` directory into the Netlify UI. Netlify will create a new site with the content you dropped. Take note of the randomly generated site name, as we’ll need to reference this later.
Automate publishing to Netlify
Before we can start using the Netlify CLI from CI, we’ll need to authenticate the CI environment with a Netlify auth token.
To create an auth token, navigate to User settings > Application > Personal access tokens.
Click on New access token and follow the prompts. Take note of the generated access token, as you won’t be able to render it again.
Now, we’ll need to make this token accessible from our CI pipeline. Different CI environments have different mechanisms for doing this. Since we’re assuming GitHub Actions, we’re going to use Repository Secrets. Navigate to your repo’s setting, then select Secrets and variables > Actions.
Create a new repository secret. Provide a name and the token value that you generated earlier.
Now, we’re ready to update our CI pipeline to publish our docs to Netlify.
```sql
- name: build
run: dbt build
- name: generate docs
if: github.ref == 'refs/heads/main'
run: |
dbt docs generate --static
mkdir docs-tmp
cp target/static_index.html docs-tmp/index.html
- name: publish docs
i. : github.ref == 'refs/heads/main'
run: |
npm install netlify-cli
npm exec netlify-cli deploy -- --site YOUR_SITE_NAME_HERE --dir tmp-docs
env:
NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
```
A few things to note:
- We've updated the generate docs step to rename and move the doc file into a separate directory. This is just to make it easier to provide `docs-tmp` as input to the Netlify CLI.
- We're installing the Netlify CLI and immediately invoking it, using the assigned site name from earlier along with the `tmp-docs` directory.
- We're providing an environment variable named `NETLIFY_AUTH_ TOKEN` that maps to the secret we stored earlier.
That's it! Now, every `main` dbt build will publish docs to `<site-name>.netlify.app`. Check out the Netlify docs if you want to learn more about how to protect your site.
Option 3: AWS
Storage providers like AWS S3, Google Cloud Storage, Cloudflare R2, and Azure Blob Storage are some of the more obvious places we can use to host static files. However, depending on your user permissions on these platforms and the expected security around your dbt docs, you might be several JIRA tickets away from getting something off the ground.
Nevertheless, we’ll outline how you can provision an S3 bucket for hosting your dbt docs, along with some options for implementing secure access to the bucket.
For this method, we’re going to assume that you have the AWS CLI installed on your machine.
If you’re on macOS and use Homebrew, run the following command:
```sql
brew update && brew install awscli
```
Otherwise, follow the directions here.
Publish to a publicly accessible bucket
If you’re not concerned about unauthorized access to your dbt docs, then creating a publicly accessible S3 bucket is a simple way to go.
First, we’ll create the bucket.
```sql
aws s3api create-bucket --bucket dbt-docs --region us-east-1
```
Now, we’ll configure the bucket to be publicly accessible.
```sql
aws s3api put-bucket-policy --bucket dbt-docs --policy '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::dbt-docs/*"
}
]
}'
aws s3api put-public-access-block --bucket YOUR-BUCKET-NAME --public-access-block-configuration "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"
```
Now, we'll provision a dedicated user for writing to this bucket. This is the user that we’ll use to publish docs from the CI pipeline.
```sql
aws iam create-user --user-name dbt-docs-writer
aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::dbt-docs/*",
"arn:aws:s3:::dbt-docs"
]
}
]
}'
aws iam create-access-key --user-name dbt-docs-writer
```
Take note of the generated tokens, as we’ll need them later.
With the S3 bucket in place, you can skip down to the Automated publishing section of this article.
Publish to a VPC-accessible bucket
If your company VPNs into a VPC for day-to-day operations, you can leverage that by creating a bucket that’s only accessible via a VPC endpoint.
This article isn’t the best place to regurgitate all of the thorough documentation on this subject. As this starts to get heavily into dev ops territory, it’d be wise to check if your organization already has a process in place for restricting bucket access based on network constraints.
Assuming that you’re able to lock down a bucket for reading properly, you’ll still need to provision a user that can write to the bucket from your CI process.
```sql
aws iam create-user --user-name dbt-docs-writer
aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::dbt-docs/*",
"arn:aws:s3:::dbt-docs"
]
}
]
}'
aws iam create-access-key --user-name dbt-docs-writer
```
Take note of the generated tokens, as we’ll need them later.
Publish to a lambda-accessible bucket
One way to lock down your S3 bucket is by making it accessible only via an authenticated proxy service. We can do just that by deploying a lambda and registering it as an OpenID Connect (OIDC) app within your identity provider, e.g., Okta, Azure, OneLogin, etc.
When a user tries to access a file via this lambda’s URL, a tiny bit of code will run to first authenticate with your IdP. If all is well, the lambda will read the file from S3 using read-only credentials and return the content to the end user.
Treat this method as a proof-of-concept to inspire similar approaches and discuss related Cloud platform/IdP changes with your DevOps and IT teams, as they may already have solutions or conventions for these types of problems.
Note: these steps depend on you having Node and npm installed on your local machine.
Publish to a private bucket
We’re going to create an S3 bucket that’s only readable by our lambda.
First, create a role for the lambda.
```sql
aws iam create-role \
--role-name dbt-docs-lambda \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}]
}'
```
While we’re here, we also need to make sure that this whole has the permissions required to write to CloudWatch logs.
```sql
aws iam attach-role-policy \
--role-name dbt-docs-lambda \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
```
Now, we’ll create the bucket with a policy that allows the lambda role to read from it.
```sql
aws s3api create-bucket \
--bucket dbt-docs \
--region us-east-1
aws s3api put-public-access-block \
--bucket dbt-docs \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
aws s3api put-bucket-policy \
--bucket dbt-docs \
--policy "{
\"Version\": \"2012-10-17\",
\"Statement\": [
{
\"Sid\": \"LambdaAccess\",
\"Effect\": \"Allow\",
\"Principal\": {
\"AWS\": \"$(aws iam get-role --role-name dbt-docs-lambda --query 'Role.Arn' --output text)\"
},=
\"Action\": [
\"s3:GetObject\",
\"s3:ListBucket\"
],
\"Resource\": [
\"arn:aws:s3:::dbt-docs\",
\"arn:aws:s3:::dbt-docs/*\"
]
}
]
}"
```
Creating the lambda
Start by cloning the example project found here. After you’ve cloned it and `cd`'d into it, install your project's dependencies.
```sql
npm install
```
Open the `index.js` file and comment out the following lines:
```sql
app.use(
auth({
authorizationParams: {
response_type: "id_token",
},
})
);
```
We need to comment these lines out initially because there is a bit of a chicken-and-egg problem: we need a lambda URL and OIDC client credentials in order for that code to run successfully, but we can’t obtain those until we’ve deployed the lambda for the first time.
Now, we’ll create the initial lambda.
```sql
aws lambda create-function \
--function-name dbt-docs \
--runtime nodejs22.x \
--zip-file fileb://function.zip \
--handler index.handler \
--role $(aws iam get-role --role-name dbt-docs-lambda --query 'Role.Arn' --output text)
```
Create a URL for it and log it out.
```sql
aws lambda create-function-url-config \
--function-name dbt-docs \
--auth-type NONE
aws lambda get-function-url-config --function-name dbt-docs
```
At this point, if all went well, you should be able to navigate to your lambda’s assigned URL using your browser and see “hello world.”
Create an OIDC application
The lambda we just deployed needs to be able to communicate with your IdP. Now, we need to log in to your IdP and register a new OIDC application that references the lambda. We’re going to assume Okta for this step, but the same steps apply to all OIDC-supporting IdPs.
Once in Okta, navigate to Application and click Create Application Integration.
Select OIDC as the Sign-in method and Web application as the Application type then continue.
On the next screen, you’ll need to do the following:
- Give your application a name, like “dbt Docs.”
- In the Grant type section, check “Refresh token” and “Implicit (hybrid).”
- Set the Sign-in redirect URI to https://MY-LAMBDA-URL/callback, e.g., https://random-numbers-and-letters.lambda-url.us-east-1.on.aws/callback. Don't forget the `/callback` at the end.
- Set the Sign-out redirect URL to your lambda’s URL.
- Under Assignments select the users that should be able to access this application.
- Click Save.
After the OIDC app has been created, you should have access to a client ID and a client secret. Store these for later.
Update the lambda
Now that we have a lambda URL and OIDC application, we can configure the lambda to authenticate requests.
First, uncomment the code that we commented out earlier.
```sql
app.use(
auth({
authorizationParams: {
response_type: "id_token",
},
})
);
```
Now, create a new `.env` file in the root of the project directory and paste the following into it.
```sql
ISSUER_BASE_URL=https://your-okta-domain.okta.com
CLIENT_ID=your-oidc-app-client-id
CLIENT_SECRET=your-oidc-app-client-secret
BASE_URL=https://your-lambda-url.lambda-url.us-east-1.on.aws
SECRET=LONG_RANDOM_VALUE
S3_BUCKET=your-s3-bucket-name
REGION=us-east-1
```
Most of these should be self-explanatory, with the exception of SECRET. From the underlying library’s docs:
"The secret(s) used to derive an encryption key for the user identity in a stateless session cookie."
Just make sure this is a long random string.
The `package.json` in the project comes with some convenience scripts to make updating the lambda’s code and environment variables more ergonomic.
Since we updated our code, we’ll need to redeploy.
```sql
npm run deploy
```
We’ll need to wait a few seconds for the update to complete before setting the environment variables using:
```sql
npm run update-function-configuration
```
If you run the second command too soon, you may see an error like this:
```sql
An error occurred (ResourceConflictException) when calling the UpdateFunctionConfiguration operation: The operation cannot be performed at this time. An update is in progress for resource
```
Just try again a few seconds later.
Create a dbt doc writer user
We’re going to need a IAM user and associated credentials in order to write to the S3 bucket from our CI pipeline.
```sql
aws iam create-user --user-name dbt-docs-writer
aws iam put-user-policy --user-name dbt-docs-writer --policy-name dbt-docs-writer-policy --policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::dbt-docs/*",
"arn:aws:s3:::dbt-docs"
]
}
]
}'
aws iam create-access-key --user-name dbt-docs-writer
```
Keep note of these credentials as we’re going to need them later when updating the CI pipeline.
At this point, all we need to do is automate the publishing from CI. Once you’ve completed the steps below, you’ll be able to access the published docs by navigating to the lambda’s URL while appending the name of the file, e.g., `index.html`.
Automate dbt doc publishing to S3
Before we can start using the AWS CLI from CI, we’ll need to authenticate the CI environment with AWS IAM tokens. Different CI environments have different mechanisms for doing this. Since we’re assuming GitHub Actions, we’re going to use Repository Secrets. Navigate to your repo’s setting, then Secrets and variables > Actions.
Now, we'll need to create two new repository secrets: `AWS_ACCESS_ KEY_ ID` and `AWS_SECRET_ACCESS_KEY`. Provide the respective value to each.
GitHub Actions environments already come with the `aws` CLI. If you’re not using GitHub Actions you’ll need to explore options for making it available, e.g., installing it in an earlier step or updating your container’s image.
Now, we’re ready to update our CI pipeline to publish our docs to S3.
```sql
- name: build
run: dbt build
- name: generate docs
if: github.ref == 'refs/heads/main'
run: dbt docs generate --static
- name: publish docs
if: github.ref == 'refs/heads/main'
run: aws s3 cp target/static_index.html s3://dbt-docs/index.html
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-1
```
Finally, we’re finished. Any dbt builds that occur on your main branch will now publish the generated docs to S3.
Share dbt docs with your entire team
If dbt is an essential part of your data pipeline, proper documentation is essential. For dbt Core users, this can be tricky, but using any of the three methods outlined above, you can make documentation a seamless part of your dbt workflow.
Table of contents
Tags
...
...