Vivian Health Improves Data Quality with dbt and Metaplane to Connect Healthcare Professionals with their Dream Job
"Metaplane helped us achieve last quarter’s data quality OKR."
At Vivian Health, data is everything - it’s what underpins their core platform that connects healthcare professionals with the right job opportunities.
Max Calehuff, Data Engineer at Vivian Health, sheds light on the “transformative” initiatives undertaken by the company's data team, who handles both the machine learning modeling for their product, in addition to serving data to the rest of the company to make decisions. To serve the need for data, Max sits on a team of 10, including data engineers, analysts, and machine learning engineers. Between him and one additional data engineer, they tackle everything from:
- Creating and maintaining pipelines
- Data cleansing
- Source integration research
- Ensuring data integrity stays high after
- Translating business asks into data models
Vivian’s Data Stack
To serve the vast data needs of Vivian, the team set up a best in-class data stack.
- At the core of their infrastructure, they utilize Snowflake as their data warehouse solution.
- Upstream sources include multiple PostgreSQL databases that serve different workloads, various business applications, events captured by Segment, and more. Vivian Health uses various data ingestion methods, combining Stitch, Airflow, Kinesis Firehose, and in-house builds, to handle data integration into Snowflake.
- Looker, a powerful business intelligence tool, empowers their teams with insightful visualizations and reporting capabilities.
- The team leverages dbt, a transformation tool, to enhance their data modeling processes.
- Census facilitates reverse ETL to pipe curated data back into business applications such as Salesforce or Amplitude.
- Vivian set up Metaplane integrations to Postgres, dbt, Snowflake, and Looker.
How Vivian uses dbt to expand modeling capabilities
The Vivan team knew they needed a powerful and accessible transformation tool to align with the powerful capabilities of Snowflake Data Cloud and underpin their customer experience.
Everything from analytics tables to machine learning models is transformed and modeled in dbt. For example, the training data for their ML models used to show the most accurate recommendations to job seekers. The data team models data from various upstream sources—their CRM,web traffic, and application events—in dbt, using DRY code to create repeatable data models fit for the scale needed to ensure that their job recommendations continue to get better.
Vivian’s data team also relies on dbt to model data for internal use cases including:
- Analytics: the same web and application data models feed into internal product performance dashboards, as well as customer success and marketing dashboards, which are used to increase customer retention and usage rates, as well as inform their organic search strategy to attract more customers.
- In-App Decision Making: Beyond Looker dashboards, the team funnels their dbt models further downstream using Census to send modeled data feeds back into business applications themselves, making it even more convenient for internal stakeholders to draw insights in the tools they’re familiar with.
Across use cases, the team benefits from dbt Cloud’s testing so data practitioners can write and test analytics code all in one place. As Max says:
❝ The ability to load sample data into dbt allows us to that verify our models works as intended before deployment.”
Transformative Solution
Being able to create models both through SQL as well as Python in dbt has allowed the data team's analysts to quickly create models without the need to wait for the infrastructure focused data engineers to assist them.
In the past, Max has been part of teams that relied on performing transformations exclusively within business intelligence (BI) tools like Tableau and Looker. Max recalls that when using Persistent Derived Tables in Looker to hold transformed data, the approach not only required specialized LookML knowledge to setup, but also led to more fragile code, resulting in wasted time on constant maintenance.
With dbt, the sheer amount of models and supported dashboards speaks for itself. By having one tool where the data team can write analytics code in SQL or Python, they’ve effectively tripled the amount of people on the team who are able to create models.
Going forward, Max says “we want to use the Python functionality more - it’s already given us results”.
This frees up time for Max and other engineers on the team to focus on more strategic initiatives, such as improving the machine learning models and writing advanced tests.
❝ Everyone on the team knows how to create a dbt model, and with only 2 data engineers that traditionally were responsible for modeling, that’s pretty cool. I can make a new dbt model as fast as it takes me to write a SQL query - it’s so much faster than how we did things at my previous roles. I could not go back, ever.”
How Vivian uses Metaplane to improve data quality coverage
With customer careers and by extension, healthcare patient experiences, at stake, Vivian’s data team knew that data quality needed to be prioritized. The search for a data observability tool came shortly after incorporating other components into their data stack, prompted by a series of data incidents, with downstream impacts that were difficult to identify and equally as difficult to fix.
The usual solution would be to deploy unit tests on their pipeline, but the data teams’ previous experiences pointed out several issues with this workflow:
- Time consuming: Setting up tests for a single pipeline could range anywhere from an hour to an entire day.
- Test accuracy: Test thresholds would need to be updated over time as their data and expectations changed.
- Scale for new objects: They’d be continuing to use more data, both ingesting net new sources as well as creating new models.
All of this led to the need to look for a more scalable solution to data quality monitoring.
Luckily at that time, the team was already using dbt, and started off by implementing dbt tests. They were able to capture incidents with causes such as stale data, but wanted to continue their success with more test types and additional object coverage.
Higher data quality, better patient care
With the implementation of Metaplane, Vivian Health now possesses a comprehensive solution to monitor incidents across their entire data pipeline. Data quality is monitored both in the upstream transactional PostgreSQL database to the outputs to dbt modeled tables in Snowflake, with the ability to see how incidents impact dashboards in Looker.
❝ If I get a Metaplane alert, it’s always something that’s gone wrong, which is what we want - we want to sure that we’re catching everything without over-alerting.”
The practical application of Metaplane within Vivian Health's operations spans similar use cases mentioned above, with a couple examples being:
- SEO Analytics: For web analytics, Metaplane plays a role in verifying the stream of events, facilitating proactive conversations where the data team can alert the web team to adjust their ingestion pipeline.
- Product development: To support the Vivian Health platform, Metaplane helps ensure that healthcare professionals receive the most relevant job postings by monitoring training data for the recommendation models, in addition other user experience improvements, such as models that indicate when a posting might be stale or inaccurate.
After researching and evaluating other data observability solutions, Vivian Health found Metaplane to be the most user-friendly, cost-effective, and, most importantly, capable of identifying their data quality issues. Their implementation improved their ability to monitor data integrity for all of their critical objects, resulting in improved data reliability and more efficient incident resolution, while saving time spent setting up and maintaining acceptable thresholds for data quality tests. Outside of the product, Max also had this to say about the team:
❝ We've been using Metaplane for at least a year and a half ... I've seen all the many improvements that Metaplane's made. It's pretty cool to talk directly to developers and 90% of the time, when I bring up a feature it's either being worked on, or, what happens a lot is - they'll just tell me 'Oh we fixed that already - I'll turn it on for you' "
Using dbt with Metaplane
Today, Vivian Health still uses dbt cloud and Metaplane side by side. Not only do they continue to rely on dbt tests, but they also use Metaplane’s monitors to track the outputs of those models, while deploying similar monitor types on their tables. As any good data engineer knows, it’s a good practice to set up redundancy, particularly when data is of such high importance to the success of the company.
One use case where both tools shine together is in their relatively new use of Snowpark. The team wanted to use Snowpark’s ability to execute more complex python transformations to generate training data weekly. In the process of testing Snowpark, they set up a new compute warehouse and infrastructure efficiency monitoring to track warehouse usage. Rather than undergo the arduous process of calling Snowflake’s API and parsing the JSON into usable tables, they elected to use dbt to model the usage data, and placed Metaplane on top of that output table to alert whenever costs spiked (note that this was prior to our release of usage analytics).
In addition to monitoring the outputs of models, Metaplane also monitors the job runtimes themselves, to alert the team to additional latency that might impact their downstream usage - this is particularly important when the outputs of models are directly fed into user experiences.
Summary
With dbt, Max was able to:
- dbt has allowed both analysts and engineers to generate models faster than ever before
- Improve customer satisfaction and retention
With Metaplane, Max was able to:
- Increase data quality coverage across all dbt models and outputs
- Save weeks of effort in creating and updating tests
❝ We had an OKR last quarter related to data quality that Metaplane helped us achieve”
Max also had these final words to say:
❝ dbt is a critical part of our infrastructure, and Metaplane allows us to ensure that it is running smoothly around the clock."