How to manage dbt alert fatigue
Getting too many alerts is, functionally, the same as getting no alerts. Learn how to tame your dbt alerts and make them meaningful for you and your team.
There's this weird catch-22 that happens with alerts in the data world. It feels like the more alerts you get, the less helpful they actually become.
Flying blind with no alerts isn’t the solution, though, as you'll miss critical issues that need attention. But on the other hand, when your Slack lights up like a Christmas tree with notifications, you might as well have no alerts at all, because everyone will just start ignoring them. It's especially telling when you see messages like "Does anyone know if this is an issue?" scattered among the alerts.
That's a clear sign that your team has started treating alerts as background noise rather than actionable information. When real problems are getting lost in a sea of notifications, it's time to take a step back and rethink your approach.
While we can't (and shouldn't) get rid of alerts entirely, we can definitely make them entirely more manageable. Here's how to strike that perfect balance between staying informed and staying sane.
How to manage dbt alert fatigue
1. Leverage owners, tags, and documentation
Think of your dbt project metadata like a good map–it helps you navigate when things go wrong. You need to create a comprehensive system that clearly defines who owns what and why it matters.
Start by assigning ownership. When a model breaks, nobody should have to play detective to figure out who needs to know about it.
Make sure to document the business context too. What dashboards depend on this model? What decisions are being made with this data? This might feel like overkill, but enriching your metadata helps your team quickly understand the impact when something breaks.
2. Only monitor and test what you need to
When thinking about dbt testing, there’s a tendency for teams to monitor everything possible in the pursuit of quality data. This can get extremely redundant, though, and bog down your engineers in what’s more or less a vanity pursuit.
Instead, use dbt tests for your known knowns, that is, issues that for sure need to stop your pipeline from running.
If the severity of a dbt test is set to warn instead of fail, you should consider removing the test or turning it into a continuous monitor. As your codebase and team grow, adding too many dbt tests with a warning severity can cause alert fatigue and lead to people ignoring dbt alerts because in many cases, no action is taken on these less severe alerts.
3. Route your alerts to the right people
Earlier we mentioned including owners in the metadata of your dbt models. These assigned owners should also be the first ones to know when a run fails.
If you’re alerting your entire team, they’re going to begin ignoring those alerts, because they’ll constantly be going off, but rarely be something they can actually fix. By leveraging metadata or tags, you can route alerts to the technical owner who will know the context, how severe an issue is, and how to solve it.
```sql
version: 2
models:
- name: users
meta:
owner: "@alice"
model_maturity: in dev
```
4. Provide richer context
Traditionally with dbt alerts, it’s difficult to know what actually went wrong. You get an alert that something failed, but no context into which model or test. This means someone needs to go in and pour over docs to find out what the issue actually is before they can go about solving it.
When an alert goes out, it should include everything needed to start solving the problem immediately. That’s why we built our own stand-alone dbt alerting tool.
Without getting too into the weeds about how we built dbt alerts, our version of dbt alerts includes context like failing models and tests, error logs, and run times, so engineers can know straight away how severe an issue is, what’s affected, and how to fix it.
5. Loop in stakeholders when you need to
This might be a controversial opinion, or it might not, but here it is: not every stakeholder needs to know about every data pipeline issue. In reality, they only need to know when something affects their ability to do their job.
Instead of alerting stakeholders any time there’s an issue, regardless of where it is, aim to only loop them in when you know a downstream data product they rely on—like a dashboard or an ML model—is affected.
When you involve them, make sure to share helpful context that shows what went wrong, and how you’re working to resolve the issue. An example message might read:
“Hey team, I wanted to give you a heads up that we identified a data refresh issue causing some of the data that your dashboards use to be out of date since 10AM EST. We have identified the root cause and are working on getting this data refreshed within the next hour. Please let me know if you have any questions.”
Much more useful than ETL_JOB_1234 FAILED: Exit code 1.
If you can include which dashboards were impacted, that’s even better.
6. Scale your alerting strategy with your team
Your alerting strategy needs to grow up alongside your team. What works for a three-person data team won't cut it when you've got multiple teams across different time zones.
For smaller teams that own dbt along with most of the rest of the data pipeline, you can probably get away with sending everything to a #data-alerts channel because everyone knows the whole pipeline anyway. You can use Slack threads to loop in others when needed, and this should meet your needs perfectly fine.
As your team grows and begins to split responsibilities, things need to get more sophisticated. Different teams will own different parts of the pipeline, and your alerts should reflect that. Lean on those ownership tags and team designations in your metadata to send dbt alerts to the analytics engineering team (or team responsible for transforming data) and at-mention technical owners.
For larger organizations with multiple teams using dbt, try to avoid the "alert everyone about everything" trap. The mobile analytics team probably doesn't need to know about the finance team's reconciliation jobs failing.
Use your metadata to route alerts to different channels based on team ownership using dbt owners and tags. That way, people are only getting alerts for the issues that matter to them, not a catch-all notification that’s not applicable nine times out of ten.
Final thoughts
Tackling dbt alerts might feel like a low-priority issue, but trust us, once you do, life will be a lot easier for you and your team.
When you leverage owners and tags to route your alerts and provide richer context, your team will not only stop ignoring their alerts, but they’ll be able to triage and resolve the issues faster, too.
Set up more context-rich alerts today using Metaplane’s free dbt alerting tool. We have some helpful docs to get you started, as well.
Table of contents
Tags
...
...