How Sigma uses Metaplane to track impacts to Sigma Workbooks
"Using Metaplane feels like having another data team member dedicated to keeping up and watching every change."
Sigma, a data-driven company, seeks to democratize data exploration traditionally restricted by SQL. In a recent interview, Jake Hannan, Senior Manager of the Data Platform at Sigma, highlights the significance of data engineering and the crucial role of educating internal users to keep teams up to date on new data concepts and releases. Discover more about Jake and how Sigma has benefited from Metaplane.
As Jake mentions:
❝ Working at Sigma, every user has full access to be a power user, which can expand the depth of analysis they can do. The main challenge is that our product and engineering teams ship features and improvements so quickly that it can be hard to stay on top of how to best use every new feature.
Jake's team leverages dbt to clean and curate the data ingested by Fivetran before storing it in Snowflake. This process enables business users to access essential datasets, including crucial reporting workbooks utilized by all business departments. Here's a simplified version of the setup:
The complexity of Sigma’s stack
❝ I think anyone working in data that has the idea that everything is fine, hasn’t been working in data long enough. As an example, a 200 person company could easily have 20+ tools across all domains that need to be ingested and modeled such that someone can analyze data to make decisions.
As a team of 5, supporting a company of over 400 people, Jake and the data team had to handle ingesting data from the business applications used by all areas of the business. Although they have analytics team members that are specialized in particular team departments, handling that volume of data sources still includes:
- Understanding the raw data schemas down to field definitions
- Keeping up with any schema changes propagated by the data source, such as updates to custom fields in Salesforce
- Identifying the full lineage from raw data through to a Sigma workbook
- Creating a unified understanding of multiple disparate data sources
While the team uses dbt tests as most do (e.g,. ensuring uniqueness, not nulls, accepted values), having to manually scope and maintain thresholds and anomaly reporting for these objects would require additional work from the team. Jake’s viewpoint also favors the “buy” side in a “build vs buy” mentality:
❝ We could invest heavily in doing this very well, but it wouldn’t be as straightforward to use, and be a burden on overall productivity. Metaplane also ships like crazy, and the added value of new features would be hard for us to replicate internally.
How Sigma uses Metaplane
Jake’s team relies on Metaplane to monitor various areas of the pipeline, including “raw” data (e.g. Salesforce tables updated by Fivetran directly), “cleaned” datasets that are directly referenced by Sigma workbooks, and dbt jobs responsible for transitioning data from the "raw" to "cleaned" states. By deploying monitors for metrics like data freshness and row count changes, among others, the team can proactively identify potential data quality issues. If the row count delta suddenly spikes or drops, it’s likely that a source has dramatically changed upstream, such as when the Sigma engineering team initiates a project to clean up old development environments.
❝ Metaplane is deployed across datasets containing billions of records, but is able to use an XS warehouse for minimal impact to Snowflake compute.
When an issue or change arises, Jake has implemented Slack routing to improve the convenience of receiving alerts, leading to faster incident resolution and stakeholder notification times. Among the channels he employs are:
- Analytics Engineering: All incidents related to the primary database hosting dbt model outputs and inputs are directed to the "Analytics Engineering" channel. This channel encompasses not only the data team but also includes approximately 80% of the company, given that many of the business units rely heavily on data.
- Revenue Operations: Salesforce administrators change field definitions and names relatively frequently, and may not be aware of the downstream impacts to Sigma workbooks. By putting schema change notifications and outages directly into this channel with a much smaller, focused user base, business stakeholders become aware of how changes made in an application affect workbooks, as well as learn about issues in workbooks themselves.
All of this culminates in improved data culture and increased trust in the data team through awareness of issues.
When it comes to preventing issues, dbt’s Slim CI is useful for incorporating dbt tests into the quality assurance workflow, but is limited to the constraints of the tests that already have been set up in dbt, and also doesn’t provide additional visibility into affected Sigma workbooks.
One small example that Jake shared was:
❝ We changed values to be lowercase for consistency, as opposed to camelcase, and we were able to use the test previews and lineage graphs to understand at a glance how Sigma dashboards would be impacted.
Summary
Thanks to Metaplane, Sigma achieved the following benefits:
- Integrated with their entire data stack, enabling them to trace the origin of issues and assess the potential impact on their Sigma instance effectively.
- Enhanced their time-to-issue detection by 5 times, allowing them to swiftly identify and address problems, leading to more efficient data operations and faster resolutions.
❝The main benefit is peace of mind. We maintain a vast ecosystem where it’s hard to stay on top of everything. We could have stood up infrastructure and processes to do it ourselves, but instead, using Metaplane really feels like having another person on the team dedicated to keeping up with every change.
To become the model Sigma user, Sigma aims to empower business users to create and utilize workbooks that cater to their specific needs. They plan to achieve this by:
- Additional education - including a repository for defining general data concepts, creating enablement content on new workbooks and features, and
- Upscaling usage - using the feedback from an internal user survey about how to make existing workbooks easier to use and more useful
- Analyzing behavioral trends - Sigma collects product data to iterate on new features and understand adoption rate