Census Activates Data Quality Improvements with Metaplane
“Census helps me amplify analytics engineering work by ensuring data is used for more than just reporting. Metaplane helps me move faster, more confidently, and with greater trust from stakeholders."
Learn how Julie, Head of Data at Census, thinks about establishing a data program and the role that Metaplane has to play.
There are two things that will stand out to you in this case study:
- Census - This is probably a tool that you’re either using right now, or considering using in the near future. Census pioneered the Reverse ETL category to sync data from your data warehouse into your company’s SaaS applications. Data teams at Canva, ClickUp, and Genesis Motor use Census to personalize marketing campaigns, prioritize sales leads, and automate countless other business processes.
- Julie Beynon - If you think you’ve seen this name before, you’re right (this is her second case study). Now in her second Head of Data role, she’s the first dedicated data hire at Census, overseeing all analytics responsibilities.
How data is used at Census
If you were to join Julie’s team today - the first thing you would notice when you query Census’s Snowflake instance is probably the fact that a database is named after Cholula, after the Great Pyramid of Cholula (not the hot sauce). Immediately after that, you’d realize that the data team owns analytics for applications across the entire company, covering:
- Marketing analytics - using data from sources such as Facebook Ads, Hubspot, and LinkedIn Ads
- Product analytics - with product usage data stored in their PostgreSQL instance
- Revenue analytics - using data from their CRM, Salesforce, and their billing system, Nue.io
Julie shared this piece of insight when it comes to setting up a data stack:
❝ I wouldn’t start a new job if they didn’t have or weren’t open to having Census, dbt, and Metaplane. After getting a warehouse, I recommend setting up dbt to model your data and immediately answer ad-hoc questions. After that, I’d implement Census to directly feed the data into business applications to minimize downtime between insight to decision and I’d layer Metaplane over all of this to make sure that we were only activating with accurate data.”
Experience with Metaplane
Julie had previously found success with Metaplane’s free tier and by moving to a paid plan at Census, was able to continue growing trust in data faster than she had previously been able to.
Finding data quality issues
Metaplane uses machine learning to find anomalies in your data. When incidents are found, alerts are sent to a collaboration tool such as Slack, Microsoft Teams, or PagerDuty, to fit within common workflows. Julie and the team at Census focus on two groups of issues:
- Schema Changes - In a more recent case, the Operations team at Census added new custom fields in Salesforce, which alerted Julie to outline a plan of action for accommodating these new fields in ongoing analytics.
In other words, schema change alerts helped bridge the gap between business operations teams and downstream business stakeholders, while educating everyone (in the slack channel) on data dependencies.
- Anomalous data updates - Through a combination of monitors finding issues with freshness, row count, and dbt job durations, Julie and the broader Census team are able to find the majority of issues at the moment that they occur. Freshness and row count monitors are heavily utilized in the Cholula database mentioned at the beginning of this piece, which is also where their dbt job outputs are materialized. By coordinating these different types of monitors along the dbt outputs, they’re able to identify any production issues that would impact business stakeholders relying on this modeled data.
❝ Metaplane’s freshness, row count, and dbt job duration monitor alerts give us early indicators of data issues. Most perceived data incidents stem from stale or partial data, caused either by ingestion or modeling errors. By using Metaplane, we can often ‘intercept’ bad data and alert stakeholders before the next time they log into the business applications that Census is feeding data into.”
Improving data stack ROI
Finding anomalous dbt job durations eliminates one troublesome aspect of dbt implementation not related to actual modeling, encouraging further usage, but what about the rest of the data stack?
As Julie continues to build the data strategy, they’ve also been able to use Metaplane to:
- Move faster to analyze new datasets - In addition to Census data activation feeds, the team also uses Omni to find additional business context. Julie has recently been onboarding additional schemas and tables for data that historically have already been used in Omni, requiring her to adjust object references in her queries. By using Metaplane’s column level lineage, she is able to quickly understand how tables are used downstream to ensure a smooth cutover.
❝ I’ve been able to make changes so much faster by having the context of upstream and downstream dependencies at my fingertips.”
- Optimize Snowflake spend - As the first hire for all-things-data, Julie was also tasked with reviewing Snowflake spend to make sure that historical artifacts were taken care of, and credits were used wisely. To do this, Metaplane’s Snowflake Spend Analysis feature was used to help Julie understand where credits were most being heavily used in frequent jobs, compute-intensive queries, and even service users.
❝ The real-time notification feature for unexpected high or low spend is particularly useful, as it allows for immediate action and adjustments. This proactive approach to managing Snowflake usage not only helps in controlling costs but also ensures that our data infrastructure is efficient and well-suited to our organization's needs.”
How Census uses the Census integration
When an issue is discovered, Census uses the brand new Census integration to understand how their own Census syncs are impacted. Using Metaplane’s automatically generated column-level lineage graphs, Julie and the team are able to understand which Census destinations are downstream of any data quality issues.
❝ I’m able to immediately see which business applications are affected and use that to notify our stakeholders that we’re looking into resolving their issue. It’s not only helped us understand the impact immediately, but also how to better prioritize resolving issues based on the workflows that were affected.”
Issue Prevention
As you can probably infer, a burgeoning data strategy building off of an existing data stack probably merits a lot of changes, which is directly reflected in the volume of dbt-related pull requests. With Julie being relatively new to the data team, she mentions that it’s been useful to have Metaplane’s Data CI/CD feature forecasting what downstream changes would occur as a result of updates to those dbt models.
❝ Having the Metaplane app directly in Github sitting next to my dbt PR has been so useful to help me prevent making changes that I didn’t want to. By seeing the deltas in values of dependent models, I can proactively go to stakeholders, educate them on the changes I’m making, and identify what impact it’d have on their work. It’s a necessary step to retain the trust that the company has in data.”
The future of Census’ data program
As a data technology company, data isn’t only something that Census cares about for its customers. It sits at the beating heart of the business.
As Julie implements more reporting and data activation workflows, it becomes increasingly important for the team to ensure that data is clean.
❝ Census helps me amplify my analytics engineering work by ensuring data is used for more than just reporting. Metaplane doesn’t only ensure I avoid regressing on that impact – it helps me move faster, more confidently, and with greater trust from my stakeholders.”