Import Historical Data Backfill
Metaplane's latest feature eliminates training time needed by machine learning models.
Metaplane was built to find data quality issues across your data stack. Traditionally this was done with unit testing, where unit tests profiled and re-profiled your data through sampling queries and found any data that was outside of the parameters you define. But Metaplane’s anomaly detection differs quite a bit from the traditional approach by virtue of our machine learning-based data quality monitors. These monitors work out of the box, and require 3 days of training on your real data. This allows us to create appropriate ranges for your data quality metrics based on factors like trends and seasonality, and the models continue to update as your data changes.
At least—that’s how it used to be.
On August 8th, we announced our new monitoring platform, with the promise of more to come. Our most recent product release, the ability to backfill models with data from a .csv file, means that you no longer need to wait to reap the benefits of machine learning for your data quality monitors.
Use cases
While there are immediate benefits for anyone who’s simply eager to get started with Data Observability (link) to find data quality issues, there are a few specific scenarios in which you might want to leverage this new feature:
- Warehouse migrations - Imagine you’ve recently made the decision to switch vendors. While you’ll still need to extract and import any historical data that you’ll want to use in future analytics, you can now use those same extracts to pre-train Metaplane monitors. These data quality safeguards help foster a smooth migration.
- Dev vs Prod Environments - For teams with separate environments for cloned production data, usually for testing purposes, you can leverage this feature for a new dataset that you’d like to monitor for data quality anomalies in your development environment.
- “Snippets” from another table - In some scenarios, you might be creating new views and tables derived from a parent table—for example, a breakout of revenue per day from a parent transactions table. You’d be able to use data from the parent table to train monitors on the corresponding child tables.
Where to get started
You’ll need to log in or create a free Metaplane account to get started. From there, navigate to a monitor that you’d like to import training data for. You can find the import button in the actions menu of any monitor page:
Note that your CSV will need to follow a particular format, as outlined in our documentation. Please get in touch with our team if you’d like help with extracting or formatting imports.
Read more on recent improvements to the Metaplane platform here!
Table of contents
Tags
...
...