Gorgias knows their data is always accurate and gorgeous with Metaplane
“With Metaplane, we find issues not just when dashboards are completely broken, but when the data itself is wrong.”
As the Data & Analytics Manager at Gorgias, Elliot Trabac oversees a team of five in what could be described as a data team's paradise, with a leadership team that showcases unmatched data fluency. Every member of that leadership team knows SQL, which shows just how important data is at Gorgias. Elliot’s team is responsible for everything from data extraction and loading to modeling data so that the analytics team can create dashboards and reports. They had previously used Segment along with Hull, to centralize and analyze both product events and business applications, but decided to set up an analytics focused data stack, with Segment continuing to track events.
The team's aim was not just to centralize data, but to ensure its scalability for the future, facilitate error detection, and optimize data usage across different types. Leveraging their experience, they successfully set up their data stack, albeit with some challenges along the way.
Challenges
Elliot’s team already had a head start with establishing their data stack, starting with the sources and metrics that had previously existed in Segment, and had familiarity setting up basic dbt tests to check if data was flowing properly. What they knew they had to watch out for, instead of obvious issues, were silent data bugs, errors in the data itself that they couldn’t have known to monitor for.
❝ In many cases, dashboards aren’t ‘broken’ with red errors and no data, but the worst thing is to make a decision on data that is wrong.”
The team also knew, from their previous experiences, that they’d want to set up a way to avoid future incidents caused by code changes to models. Changes made to models, such as renaming or recalculating column values, can cause unexpected changes to downstream models and dashboards. A missing domain field for a company table might be innocuous when not referenced, but when used as the unique id could cause multiple dashboard outages.
Solution
With tools such as BigQuery, Fivetran, Hightouch, and dbt recently implemented, Gorgias turned to Metaplane to ensure that their hard work would have an extra layer of data quality certification.
❝ From the beginning (of setting up the stack), we wanted to proactively avoid the situation where people were always sending messages on Slack with doubts about the data.”
Expanding Data Quality Monitoring
Out of the box, Metaplane provided immediate value with schema change notifications, which, in some cases, were created by Fivetran’ default schema drift handling settings. Elliot’s team was immediately able to understand when queries should be adjusted.
❝ We have almost 20 people developing some models in dbt - it’s pretty easy to do a change that might have some downstream effects that break other stuff.”
They were also able to quickly expand on their dbt test strategy - which included extending freshness tests beyond a few core tables, and exploring additional “simple” monitors such as row and column counts, as early indicators of data health. Beyond being much quicker to deploy than similar dbt tests, the Gorgias team also created monitors for dbt itself, to make sure that job runtimes were still behaving as expected.
❝ We wanted something that could spot issues at any time of the day (outside of transformation workflows) that could catch issues not just with the models, but in the data itself too.”
For all of their monitors, they’ve enjoyed not needing to continuously update test thresholds and the simplicity of being able to set up new monitors with just a few clicks.
Accelerating incident triage times
When triaging data incidents, they leveraged features like column-level lineage to visually identify where issues existed on the root table. In most cases, they traced issues back to raw tables, and were able to identify the tool failures responsible for data incidents, even finding issues with the source data itself in some cases. Gorgias also uses Castor, which has column level lineage, but used in a different context.
❝ End users (analytics) would use Castor, our data catalog, for documentation on how to query the data, but Metaplane’s lineage can be useful for the data team to go up to the root data table (of the issue).”
Preventing future data quality incidents with Data CI/CD
To prevent data quality incidents, they set up Metaplane’s Data CI/CD tool, which added an extra layer of automated checks to their existing CI tests. This ensured they could identify affected models and prevent changes from impacting the data.
❝ Having additional information, like being aware of what changes will happen and which models they’ll affect, alongside existing CI/CD checks, adds an extra layer of security.”
Elliot and the Gorgias team, who’ve been using Metaplane over the past year, also had this to say:
❝ It’s been great to see some fast improvements in the product that fit within the developer workflow and thinking.”
Results
Elliot acknowledges that every data stack, including Gorgias’, will always have issues. But, with Metaplane, his team has been able to change how they handle those issues.
❝ The communication went from them (downstream consumers) telling us ‘something is wrong with the data’ to us telling them ‘we know we have an issue with a timestamp data type change caused by Fivetran, which broke dbt pipelines that are behind these models’”
The quote above is symptomatic of Gorgias’ benefits from Metaplane, which can be summarized as:
- Maintained trust internally with stakeholders
- Avoided future issues caused by changes to dbt models
- Business growth using accurate data
What’s next?
- Scaling the infrastructure where they host tools such as Airbyte to accommodate increasing volume
- Identifying errors on the data producer side as they continue to scale out pipelines (e.g. new events or changed data types)
- Exploring and testing inputs for a metrics layer to boost the analytics team’s output