Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

Maintain healthy data pipelines with Metaplane and Airbyte

Metaplane now integrates with Airbyte so you can understand the health of your pipelines over time.

and
June 26, 2024

Co-founder / Engineering

June 26, 2024
Maintain healthy data pipelines with Metaplane and Airbyte

We’ve always said that data teams should be the first to know about data issues. That’s especially true when it comes to data ingestion pipelines, where failures can cascade far downstream. 

Today, we're excited to announce our integration with Airbyte. Airbyte, the leading open-source data movement platform, enables more than 6,000 companies to sync data from 300+ structured and unstructured data sources to data warehouses, databases, and more.

More pipelines, more problems

Use cases powered by data, from BI to AI to data products, are increasingly important. As the stakes increase, the need to get ahead of common data pipeline problems is higher than ever:

  • Pipeline failures. Pipelines inevitably break for different reasons – source system failures, resource contention, and invalid SQL to name a few. These unexpected failures can cause delays and disrupt data flows, breaking downstream data processes and products.
  • Latency and performance issues. If a source or destination experiences performance issues and pipelines take longer than usual to finish, there is a ripple effect – every downstream job, transformation, and dashboard is showing stale data.
  • Data quality issues. Sometimes data pipelines succeed and data is moved, but the volume or underlying quality of the data is compromised. These discrepancies between source and destination systems are quickly noticed by stakeholders using downstream data products.
  • Unknown downstream impact. Issues happen. But once you identify an issue, the two questions you ask yourself are: What does this impact? And who needs to know about this? Too often, it’s hard to find the answers to these questions.

Airbyte observability, unlocked

To make their platform scalable, extensible, and reliable across all these use cases, Airbyte has invested heavily in making it observable through rich metadata available via an API.

Through deep integration with the Airbyte API, proactive monitoring with Metaplane’s Data Observability platform solves these common pipeline issues to give data teams peace of mind about their data.

Screenshot of Airbyte loading from multiple data sources, like Datadog and Zoom, into Snowflake.

Metaplane's integration with Airbyte takes minutes to set up. After connecting, Metaplane will populate with all of the Airbyte Workspaces, Connections, and Streams that it has access to. 

But that’s just the beginning: Metaplane provides out-of-the-box monitoring of important metrics and automatically syncs lineage so you can understand the health of your pipelines over time.

Monitoring Airbyte metrics with machine learning

Metaplane then automatically ingests metrics about the syncs that Airbyte ran for each Connection including:

  1. Last succeeded sync timestamp to ensure pipelines are up and running
  2. Duration of syncs to monitor latency issues if syncs take too long to complete
  3. Number of rows synced and bytes synced to catch quality of data volumes loaded
Screenshot of Airbyte loading from multiple data sources into Snowflake.

Metaplane maintains a historic log of these metrics so you can look back over time. Then, we automatically train machine learning models on that historical metadata that take trends, seasonality, and user feedback into account.

The end result is that you’re proactively alerted to anomalies without having to write any code or toil with configuration. Metaplane-Airbyte users can catch pipeline issues like long-running syncs, late-arriving data, and successful syncs that did not move the right amount of data from source to destination.

Automatic stream-to-destination lineage

In addition to metrics, you'll instantly see both upstream and downstream warehouse lineage for Streams. If you’re loading data from a database, we’ll include the table that is upstream of a stream. If you’re syncing into a warehouse or database, we’ll create a link between your stream and the table loaded by that stream.

Metaplane's lineage includes source systems all the way to BI and Reverse ETL syncs (and even dbt exposures!)

Now you have full end-to-end lineage from source to destination, whether that’s a BI dashboard or a reverse ETL sync. This helps teams build awareness of data – where it comes from, and how it’s used – as well as the potential root causes and downstream impact of data incidents. 

Metaplane and Airbyte are better together

Ready to get started? Sign up for Metaplane for free and start monitoring and pulling in lineage from Airbyte in minutes.

Metaplane supports monitoring both Airbyte cloud and open-source versions. For more information and support, visit our Airbyte docs to learn how to integrate using username/password authentication.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.