How Bluecore Reduced Integration Issues by Over 75% with Metaplane
"Metaplane has transformed how we operate—especially the GROUP BY monitors. They catch even the smallest variations in our data, I can finally rest easy knowing every source of every size is covered."
Bluecore’s retail shopper identification and customer movement technology quickly generates incremental revenue for enterprise brands by turning more anonymous shoppers into known customers, and repeatedly and efficiently moving them through the purchase funnel.
Nicole Dallar-Malburg, a Senior Analytics Engineer at Bluecore, is part of the Data Operations (DataOps) team responsible for managing their data stack—including BigQuery, dbt Cloud, and Hex—which allows Bluecore to:
- Build scalable analytics systems and processes, creating a single source of truth for customer-facing analytics and data exports as well as internal reporting and analysis.
- Provide customers with insight into their businesses.
- Empower their internal teams to answer questions about customer health, product adoption, financial performance, and more.
More failure points, more problems
Bluecore collects, manages, and analyzes terabytes of data for 400+ brands every day, totaling hundreds of terabytes of data processed per month. With that much data under their purview, they’d occasionally run into issues like:
- Empty purchase files or files with nonsensical date values.
- Misformatted file names that don't adhere to the proper specifications.
- Re-platforming disrupts an existing integration designed to track events on their website.
These issues—no matter how uncommon—impacted internal and external trust in their data. To remedy these issues, Bluecore launched a company-wide initiative to increase the accuracy of their reporting, thereby boosting trust in their system.
Without trust in their system and reports, Bluecore risked making suboptimal decisions based on flawed data and running into unnecessary hiccups in their daily operations. But their existing monitoring tool could only monitor the data they were ingesting at a high aggregate level and didn't allow them to catch small changes that could indicate potential problems.
This was seen as a major risk for Bluecore given that relevant, timely marketing touches with e-commerce shoppers rely on accurate and precise data. For instance, if shoppers add products to their cart and then abandon it, and Bluecore doesn't capture this event, their customers can't send “abandoned cart” emails to those shoppers. And in e-commerce, a missed abandoned cart email might mean a missed sales opportunity.
Here’s a hypothetical example of where Bluecore’s previous data monitoring platform could fall short:
- Each customer feeds data through various channels.
- Bluecore consolidates that incoming data into the relevant datasets.some text
- For example, the purchase data from one customer could have three sources (e.g., Source 1 contributes 90%, Source 2 contributes 6%, Source 3 contributes 4%).
Current monitoring wasn’t detailed enough to recognize potential issues when a minor source breaks (e.g., Source 3's contribution drops from 4% to 1%), so a drop might not trigger significant alerts due to the small overall impact.
Bluecore needed more detailed monitoring to capture important data nuances and ensure that incoming data is accurately captured, processed, and triggers the correct actions (e.g., new subscriber outreach, attributing purchases). Ultimately, these nuances create a relevant and personalized experience for shoppers, which helps Bluecore’s retailer customers grow their businesses.
❝One of our ongoing major initiatives is increasing trust and transparency in our platform. We want our teams to have the tools they need to find, diagnose, and fix system issues before the customer is impacted. - Nicole Dallar-Malburg
But given the complex needs of their enterprise customers, there were multiple failure points throughout each dataset’s lifecycle—any of which could disrupt the delivery of targeted marketing communications. Bluecore needed to guarantee that every customer click and cart addition was seen, logged, and acted upon.
Metaplane’s data observability: an automated line of defense
Bluecore quickly realized a robust data observability platform would accurately capture and process each step of the data flow—from recognizing a shopper’s activity on an online store to capturing purchases.
They briefly considered building an in-house tool, but the ongoing resources and cost of maintaining an in-house solution weren’t sustainable. The DataOps team would have to update and maintain tens of thousands of unit tests across hundreds of customer datasets, each with 100+ tables.
Bluecore needed an ML-powered solution that would dynamically learn the expected behavior of their customers’ data. They wanted to give feedback to the model since they have teams with specialized knowledge about each customer and their seasonal behaviors (e.g. if something spikes in the monitor and they know it’s going to happen, they can mark it as normal to train the monitor to expect this variation in the future, reducing noisy alerts).
Once they nixed the in-house solution, Bluecore looked into buying an off-the-shelf data observability platform. They evaluated other data observability vendors, but they struggled to find a data observability solution that could:
- Handle their terabytes of data
- Provide granular monitoring that could capture customer-, data type-, and source-specific nuances
- Could predict data quality metric thresholds using machine learning, rather than manually setting thresholds for hundreds of customers
- Integrated with their existing data stack built around BigQuery and dbt
❝We looked at another observability platform… and they couldn't handle the volume of and complexity of our data. Even reading our metadata was too much for them. - Nicole Dallar-Malburg
They ultimately chose Metaplane because it was the one platform that satisfied all of their criteria, including its ability to handle large volumes of data, best-in-class GROUP BY monitors, ML-based alerting thresholds with user-overrides, and seamless integrations with dbt and Jira.
Monitoring dimensions of data with GROUP BY monitors
Bluecore has separate tables to support each customer’s data privacy requirements. They ingest raw customer data into their data warehouse, standardize it, and perform any transformations needed on top of this now-standardized, raw data. Every step of the way, this data is kept separate—and each separate table needs monitors to guarantee the successful ingestion of customer data.
Using Metaplane, Bluecore can implement monitors for each customer’s table and event type, grouping by data source to track row count and data freshness at a granular customer/data source level. In their case, freshness and row count work as a one-two-punch line of defense:
- Freshness monitors act as an initial safeguard, alerting Bluecore when no new data arrives within a specified timeframe which might indicate an issue with a daily feed not arriving as expected.
- Row count monitors identify anomalies (e.g. receiving significantly fewer records than usual) which wouldn't trigger the freshness alert but still indicate a problem.
Metaplane’s ML-powered GROUP BY monitors ensure data continuity and accuracy with a feedback loop. Bluecore’s DataOps team can give feedback to train the model and make Metaplane’s monitors more precise and tailored to Bluecore’s specific data environment. Now, they can accurately monitor even the smallest data variations, debug with ease, and rest assured that the same attention is given to small and big sources alike—without any unnecessary noise.
❝Whenever there's a new source added, or a source goes away, or [our customers] rename a source—whatever it might be—that new data flows into our dbt model. And then Metaplane automatically picks it up without any manual intervention on my end. - Nicole Dallar-Malburg
Improving slow, manual triaging processes with Metaplane’s Jira integration
Data observability is just as much about process as it is about alerting. If there’s an error alert, but no internal process to support it, then the alert is just noise without a signal.
Bluecore has long had a dedicated triage team that responds to initial alerts. They operate independently of specific customers or integrations to determine whether an alert needs escalation to the product support team. While thorough, their previous triage process was slow because it involved several manual steps:
- Identify an issue.
- Verify and confirm the issue by checking the data.
- Log into Jira to create a ticket.
- Assign the ticket to the appropriate team.
- Fill in the necessary details on the ticket (e.g. the issue originated from Metaplane).
- Ensure the ticket reaches the responsible team member for further investigation.
Now, with Metaplane’s Jira integration, incidents identified in Metaplane can be used to create a Jira ticket. This streamlines Bluecore’s once multi-step triage process two-fold, by:
- Reducing the manualness of task creation
- Improving the accuracy of issue reporting
❝It's like a game of telephone when you're trying to get from one platform to another. So it's nice to know that we're able to port everything right over into Jira from Metaplane. - Nicole Dallar-Malburg
Successes and looking forward
With Metaplane, Bluecore has seen improvements in 3 main areas:
- Data quality issues from sources. Metaplane has identified 40+ instances where the data provided by external sources contained errors. In each of these cases, Metaplane helped Bluecore proactively notify customers of the discrepancies, both resolving potential downstream issues and preventing data processing delays.
- Broken Bluecore integrations. Bluecore used to react to 6-8 issues per week with customer integrations, mainly due to unexpected API outages from 3rd party sources. With Metaplane, they can now monitor and intervene on these issues before they cause major incidents. Previously, major discrepancies like an 80% data shortfall on a lower-volume source might go unnoticed until they become critical. Now, Bluecore can quickly identify even minor sources experiencing a 50% drop in data transmission—which might not be as voluminous but are equally critical. This early detection minimizes the usual back-and-forth in pinpointing responsibility, streamlining the resolution process, and ensuring continuous, accurate data flow.
- More time for things that matter. Previously, 35% of the team's efforts used to be consumed by the manual tasks of monitoring data integrity, triaging issues, and liaising with customers over data discrepancies. These activities—while crucial—were reactive and detracted from proactive, value-adding tasks. Now, Bluecore’s data team members spend their workdays free from manually checking data flows and integration health. The team can focus more on strategic initiatives, like enhancing product features, refining customer engagement strategies, and exploring new avenues for data utilization to drive business growth.
❝Now, we're able to lay out the facts as they are. And it's been a lot easier to jump on those issues when they occur and get them resolved quickly. - Nicole Dallar-Malburg
Over the next few quarters, Bluecore intends to use Metaplane to expand its monitoring to include new data types like product events and email deliverability. They plan to enhance their usage of GROUP BY monitors for precise incident alerting and integrate it with Slack for more categorized alerts.
Overall, Bluecore is excited to continue using Metaplane to improve their incident management efficiency, especially since they’ve introduced their Customer Movement Technology and Services. The new Customer Analytics reporting helps Bluecore customers better predict customer retention and churn, using signals specific to that customer. Thanks to Metaplane's monitoring and alerting, Bluecore can trust their data ingests, serve actionable insights, and empower retailers to thrive in any economic environment.