How Appcues reduced data quality issues by 77% using Metaplane, Snowflake, and dbt
“The important thing is that when things break, I know immediately—and I can usually fix them before any of my stakeholders find out.”
Not enough resources, difficulty storing large amounts of data, and not enough trust
As a growing company, Appcues depends on data for product, operations, sales, marketing, and customer success. But the data team is small– Andy Mackenzie, Business Intelligence Architect at Appcues, is the sole individual working exclusively on data.
❝Virtually everyone in the company uses the data I produce on a daily basis,” said Andy. “They count on me to keep that data clean and error-free.
To serve his team, Andy has to be as efficient as possible. “The faster I can rapidly build, test, and publish changes to our data schema, the more effective I can be,” he said.
Andy’s job centers around managing data and then piping it into Snowflake and other third-party tools. He also manages that data with dbt and ETL to prep it for Looker, the team’s visualization tool.
Faced with not enough resources, difficulty storing large amounts of data, and a general loss of trust in data in the organization at large, Appcues turned to Snowflake, dbt, and Metaplane and saw substantial improvements in data management.
Specifically, these three tools together allowed Andy and the Appcues team to scale performance, iterate quickly, and monitor data quality.
Snowflake + dbt + Metaplane = The perfect combination
Andy and the Appcues team adopted Snowflake, dbt, and Metaplane to help solve their challenges. But why did Andy select each tool? And, how does using these tools together lead to success?
Snowflake offered integrations and the ability to scale performance quickly with simple SQL commands
Andy reviewed a number of tools before settling on Snowflake, ultimately finding that its ability to scale and wide set of integrations set it apart from the pack.
❝Every tool seems to integrate with Snowflake, so the speed with which you can get data into Snowflake is fast
Snowflake is incredibly easy to set up and affordable, even for a team of one. With its deep ecosystem of integrations, Snowflake allowed Andy to adopt other best-in-class tools that helped increase his productivity and leverage.
❝Snowflake has a ton of native integrations with our other data pipeline tools like Segment which can send events to Snowflake
Not only that, but Snowflake can easily scale when Andy needs to run compute-intensive queries, like running analysis over product event data. This can be done with simple SQL commands.
“At any given moment, we could be dealing with 10s of billions of rows. If I need to work with a really large data set and it's worth spending the compute to get that query back in a reasonable time, I can run a simple SQL command to scale my warehouse, and I immediately get that compute power and run my query on those 10 billion rows,” said Andy. “This type of scaling is incredibly easy with Snowflake but it isn’t with other tools.”
dbt was an inexpensive, flexible ETL tool that saved Appcues substantial time and money
Alongside Snowflake, Andy needed an ETL tool that could combine data from multiple data sources. He knew of a number of proprietary tools, but they were too expensive. He also considered building his own tool, but it would’ve taken a substantial amount of time, ultimately wasting resources.
“dbt was open source and it was free for me to get started, so it was a no-brainer to try it out,” he said. Being able to quickly see how it fit within his workflow made it easy for him to adopt the tool. Andy was able to deploy dbt to production in just under 24 hours. Features like Jinja templating allowed him to write hundreds of lines of SQL that would have been thousands of lines of unmaintainable SQL queries if he wrote them without dbt.
“There’s no way I could do what I do on my own if I didn’t have a tool like dbt,” he said. “Without dbt, you’d need a data engineer to pair with me all the time. Plus, I actually have free time to do analysis, which I definitely wouldn’t if I wasn’t using dbt.”
Andy estimates that using dbt has saved the business tens of thousands of dollars due to not needing to increase engineering resources or spend hours maintaining models.
Metaplane, a data observability solution that offered visibility and insight into data
To complement Snowflake and dbt, Andy also needed a data observability solution. This was particularly important, as he wanted to be the first one in the organization to know about data issues.
Before Metaplane, data quality issues were flagged by Andy’s stakeholders — sometimes several times in one week.
“There were weeks in which I spent much of my time dealing with data quality issues that my colleagues identified,” Andy said. “I wouldn’t even know about an issue until someone pinged me – it reflected poorly on me.”
Without a data observability platform, Andy was flying blind. He would try to get it right the first time when he was pushing out changes, but if something broke, he’d need to go and fix it. This process wasted time and burned him out.
Then Andy found Metaplane. He liked that it was free to try and that he could automatically add tests across his warehouse. Ultimately, Andy configured Metaplane to monitor data quality across hundreds of tables in Snowflake and dbt, a task that would have been impossible for Andy to do on his own.
Plus, he could receive alerts in Slack and adjust the machine learning models with the click of a button. This allowed him to quickly learn how his data behaves and be the first to know about data quality issues, not the last.
Easily store and model trusted data all in one place with fast deployment
By using Snowflake, dbt, and Metaplane, Andy is better positioned to help his team. He’s able to easily store and model trusted data all in one place and make changes quickly and efficiently.
“Snowflake and dbt are huge factors in enabling me to get our data to our tools and ultimately at the fingertips of our sales team,” said Andy. “Using these with Metaplane ensures that not only is the data where it needs to be, but that it’s the right data that can help my team make decisions.”
Results
- Appcues now has a scalable data stack built on Snowflake, dbt, and Metaplane that can service the entire organization and scale with a growing team
- Snowflake offered integrations and the ability to scale performance quickly with simple SQL commands, making it easy for one individual to manage
- With dbt, Appcues could easily scale up writing data models without needing to build their own framework or hire additional data engineers
- After adopting Metaplane, Andy now experiences 77% fewer data quality issues. The entire team is more productive because they spend on average 20 to 40 hours fewer per month investigating issues with their data.