Deep Dive: Data Observability and Transactional Databases
Transactional databases are an important part of the modern data stack—and they shouldn't be left out of your data observability journey.
In many modern data stacks, important data starts in transactional databases like MySQL, SQL Server, or PostgreSQL. Part of what makes transactional databases so attractive is being purpose-built for handling everything from user events from your software apps and activities from website visitors, to logs of manufacturing processes and records of e-commerce sales, not to mention, free for PostgreSQL and MySQL. The data frequently gets replicated in a data warehouse and flows into the BI tool for analysis, but the transactional database is often the bedrock that the rest of the data stack is built upon.
Traditionally, data observability focused mostly on the warehouse and (sometimes) on the BI tool. But when issues occur, understanding how the data flows and changes in its journey through the stack can be invaluable to catching and resolving data anomalies. That’s why Metaplane has deep integrations with MySQL, SQL Server, and PostgreSQL to help give you peace of mind that your data replication process is working as designed, and to give you the full picture of the data quality of your whole data stack.
Schema changes
Data observability, as a field, is mostly concerned with the quality or status of the data itself—the volume or freshness or distribution. But the structure of the database is just as important. And in many companies, the transactional database is frequently the domain of another team entirely, meaning that data engineers and analysts can sometimes be out of the loop when it comes to important developments in the database. If a column’s type changes, your downstream queries might have issues, particularly for math functions. If a column gets deleted or re-named, the dashboards and workbooks that depend on it might break. And, on a happier note, if a new table gets created, one that’s filled with useful data, you probably want to know as soon as possible so you can start using it in your queries, your BI software, and whatever other tools you power with data.
That’s where Metaplane’s schema change feature comes in. By connecting Metaplane to your transactional database, Metaplane will send you an alert about any changes to the database: new tables, deleted columns, type changes, re-named entities, and so on. This way, data engineers can make sure they’re never out of the loop when it comes to knowing what’s happing upstream of the warehouse, and they can react to any changes before they cause issues.
Common database replication issues
Database replication can go awry for any number of reasons, whether due to connection interruptions, unexpected schema drift, and/or a need for more atomicity in last_modified timestamps being used to page extract queries. When issues do come up, you’ll usually find them in the form of incomplete or duplicate data.
Metaplane’s monitors can help catch issues like this when they occur: uniqueness monitors can catch duplicates in ID fields, and row count monitors can catch unusual spikes in the number of records or missing data.
End-to-end lineage
As we mentioned above, data flows between several tools in modern data stacks. Each tool has its own metadata about the data it handles, and all hold pieces of the story as your data moves from system to system. But this also means that understanding the lineage of your data can feel like assembling a puzzle where the pieces are scattered across different systems.
Metaplane helps make tracing that lineage simple. You can look at a database or report and trace the lineage of a column upwards as it moves through the data warehouse, all the way to the Fivetran connector that replicates the data. From there it’s an easy task to check where that Fivetran connector is pulling data from in your transactional database. This kind of analysis is incredibly useful when you’re trying to trace the root cause of an issue, and makes refactoring your data a much simpler task.
No matter your company’s unique data challenges, understanding the full picture of your data stack is critical for improving or maintaining the health of your data—and Metaplane can help. If you’d like to try out Metaplane’s transactional database integration, you can sign up for free, and get started right away, or chat with one of our product experts to learn more.
Table of contents
Tags
...
...