Comparing Snowflake Dynamic Tables with dbt
Snowflake is one of the most popular data lakehouses today - and for good reason; they not only made it extremely easy to manage the infrastructure traditionally associated with data warehouses, such as scaling storage and compute, but continue to push the envelope, with new features such as Dynamic Tables, recently highlighted again at Snowflake Summit 2023 as it enters its Preview phase.
What are Snowflake Dynamic Tables?
Dynamic tables are a new type of table in Snowflake, created from other existing objects upstream, that update (i.e. re-run its query) when its parent table(s)’ update. This is useful for any sort of modeling where you intend on reusing the results, and need the data to be current. The image below, taken from Snowflake’s documentation, is a simplified overview of how Dynamic Tables are created.
RThe “automated refresh process” natively refreshes on fixed intervals (i.e. x number of minutes, y number of hours, etc) rather than specific times of the day. A workaround with tasks can be used if you wish to use a cron schedule instead.
What is dbt?
dbt is one of the most popular frameworks today for transformations (i.e. modeling), in part, due to its ability to increase accessibility and collaboration in modeling through features such as SQL-based modeling and a centralized code repository with a history of documentation. Two core parts of dbt include:
- A profile configuration file (in YAML) - here is where you’ll specify how and which data warehouse you’ll be connecting to
- Model logic (i.e. sql files) - a collection of reusable queries (i.e. models) that you intend on reusing.
Note that the use of dbt is not constrained to Snowflake. A full list of supported integrations can be found here.
Snowflake dynamic tables vs dbt
After listening to the Snowflake presentation - my mind immediately leapt to: does this replace dbt? (which is where this article came from). The short answer is “no”. Here’s why:
Similarities
Starting with similarities - it’s easy to be confused on the role of Snowflake’s Dynamic Tables in a world where many organizations have already / are looking into implementing dbt for transformations.
- Both Dynamic Tables and dbt will update an “object” in the warehouse based on the results of your query. An exception here is that dbt can create views and (normal) tables in addition to Dynamic Tables.
- The primary modeling language for both are SQL.
- Both have some ability to self-reference. dbt models can reference others models, and Dynamic Tables can be created and updated based on other Dynamic Tables.
Differences
Once we get past the upfront benefit of automating updates to the data fed through models, there are quite a few differences between Snowflake Dynamic Tables and dbt, with a non-exhaustive list of:
- Frequency of updates: dbt runs are batched (or micro-batched), which means that models are updated at fixed times throughout the day, whereas Dynamic Tables will update once “source” data is detected to be updated (with a user-configured time lag in place).
- Modularity: dbt model code can be easily reused in other models. Although you can reference Dynamic Tables in your Dynamic Table creation, you’ll create dependencies with the biggest issue likely being time lag.
- Version Control: In part because dbt code is stored in git repositories, users will get the benefit of version control - that is, the ability to enforce code structure validation tests prior to merging the code into production, where the git repository also gives you the added benefit of storing a history of all changes.
- Validation Rules (Tests): dbt also has a native feature that can be built into model runs, “dbt tests”. These are data validation rules that users specify and can also reuse, much like models, which allows consistency in model outputs.
- Lineage + Documentation: dbt generates a diagram that shows which objects are referenced in your model, and allows users to define variables, such as the Owner of a model. Both of these features allow for clearer context into what a model is used for(and how to use it.
How to use Snowflake Dynamic Tables with dbt
In case it’s not clear, if you’re already using dbt - there’s no need to move everything over to Dynamic Tables. If you’re keen on migrating something to take advantage of this new native Snowflake feature, a good candidate would be existing models being run in your Snowflake instance with a combination of Streams + Tasks.
Shortly after Dynamic Tables entered Preview, dbt released support for Dynamic Table creation (Note: you must be viewing docs for dbt v1.6 to see the section. Scroll to the top of the docs to set your version number).This is amazing because it effectively gives you the best of both worlds all of the benefits of a git environment for your modeling code (e.g. version control), built-in lineage and documentation, AND updates to models automatically triggered by refreshed data (independent of the schedule in your profile.yml).
You likely won’t want to migrate any of the dbt models that you’re currently triggering at a daily or 12 hour cadence, but any dbt runs (and downstream data products) that require near-real-time data would be good candidates for testing Dynamic Table usage.
Upcoming releases for Snowflake’s Dynamic Tables
Dynamic Tables are still fairly new, so we should reasonably expect a few things to change:
- Expanded dbt support for Dynamic Tables: You can track the scope of the changes here, with the first step (dynamic table as a materalization option) already having been implemented within ~2 months.
- Expanded function support for Dynamic Tables: One gap right now is the lack of support for a few non-deterministic functions (e.g. CURRENT_DATE()), and the inability to trigger Tasks or Stored Procedures. We’ll likely see support for this over time.
- Ecosystem integration improvements: One notable callout has been an issue with PowerBI where the list of queryable objects in Snowflake don’t always show all of the Dynamic Tables. Other integrations that don’t currently support Dynamic Tables will likely build in support shortly.
- Git support: Take this with a grain of salt, as it seems that the timeline of this release varies depending on who you talk to and when, and there’s no official released scope, but it appears that Snowflake is building support for hosting code (e.g. Snowsight worksheets) in a git repository.
Dynamic Tables Data Quality
Both Snowflake Dynamic Tables and dbt improve automation for your modeling, but have limited support for ensuring data quality when it comes to your data. While dbt tests are a great starting point for validation rules, they fall short when it comes to asynchronous deployment at scale across all of your tables, with one tangible point being a lack of automation for acceptance test thresholds. This is here a tool like Metaplane would shine, ensuring data quality for Dynamic Tables (created through dbt or otherwise) and other objects in your Snowflake instance along with alerting on anomalous dbt job durations.
Data teams at high-growth companies (like Drift, Vendr, and SpotOn) use the Metaplane data observability platform to save engineering time and increase trust in data by understanding when things break, what went wrong, and how to fix it — before an executive messages them about a broken dashboard.
Table of contents
Tags
...
...