Get the essential data observability guide
Download this guide to learn:
What is data observability?
4 pillars of data observability
How to evaluate platforms
Common mistakes to avoid
The ROI of data observability
Unlock now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sign up for a free data observability workshop today.
Assess your company's data health and learn how to start monitoring your entire data stack.
Book free workshop
Sign up for news, updates, and events
Subscribe for free
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Getting started with Data Observability Guide

Make a plan to implement data observability across your company’s entire data stack

Download for free
Book a data observability workshop with an expert.

Assess your company's data health and learn how to start monitoring your entire data stack.

Book free workshop

The great consolidation, Elon’s foray into SQL, and AI’s impact on day-to-day data engineering

Welcome to Overheard in data—a monthly roundup of news you can use from inside the data world.

February 25, 2025
Blog Author Image

Co-founder / Data and ML

Writer / Data

February 25, 2025
The great consolidation, Elon’s foray into SQL, and AI’s impact on day-to-day data engineering

The data world is, as you might expect, chronically online. It’s also pretty scattered. There are tons of great conversations happening across Reddit, Twitter/X, Substack, Medium, and yes, even LinkedIn. 

So, in the spirit of data engineering, here’s an attempt to take information from multiple sources, funnel it to a central location, and format it for you to read and enjoy. Welcome to Overheard in data, February ‘25 edition.

Elon Musk’s intro to data engineering

Elon Musk has his hand in a lot of different pots these days, and one of his most recent DOGE undertakings caught the attention of some data engineers.

Non-data folks might take this post at face value, but for us, it raises some questions.

  • Are all of the people collecting social security really just sitting there in one single master database table?
  • That table contained a single `IS_DEAD` column?
  • What did the SQL query look like?
  • What does the payout amount data look like for each age group?

Of course, Elon isn’t going to dive into the data engineering work that yielded these results, as it’d fly over most people’s heads. At the same time, calling this real data analysis is a major stretch, and highlights why context and business logic are so important in what we do in the data world. Normally, when wonky data like this pops up, we don’t run it up the flag poll as fact, but instead, dive in and find out what other factors are at play—which leads us wonderfully to our next point.

The great consolidation

dbt ignited a lively conversation when they acquired SDF Labs at the beginning of the year. Not about the acquisition itself, but about the theme of acquisitions and consolidation in the data technology market in general. 

The past decade saw a significant boom in the data technology market, but now, the data shows a different theme—consolidation.

Image from PitchBook's Q2 2024 Data Analytics report

Sure, this graph is exaggerated by the free money era of 2020-2021, but pair it with the increased number of acquisitions we’ve seen in the data world, and there is a definite theme of tool consolidation.

What does this mean for the data folks actually using the tools? Well, the jury’s still out. Acquisition can bring a major boost to your data stack, as your existing tools might get a lot more capable, but we’ve also all seen very capable tools get neglected by the companies acquiring them.

AI’s impact on day-to-day data engineering work

This topic could probably be a standing theme in the Overheard in data column, as AI’s implications for data engineering are only going to grow. This month’s edition comes from a well-thought-out graph from Zach Morris Wilson on which aspects of data engineering work will be most impacted by LLMs, and which aspects will be least impacted.

I think Zach’s analysis is spot on here. Already, AI is capable of spinning up SQL queries that used to take much more time to write. And it can even help with some of the more soft skill aspects of the job, like answering business questions about data. Stakeholders can simply feed the data to their AI tool of choice and get a pretty decent understanding of what's happening.

The aspects of the job that AI isn’t likely to disrupt any time soon, though, are the highly strategic elements. Understanding context, business logic, and strategy are the skills that data engineers will likely spend more of their time on in the coming years. They’ll also be the skills that set data engineers apart.

Painful platform migrations

People in any profession and discipline can attest to how painful platform migrations can be. However, data platform migrations are acutely painful. That sentiment was echoed in this Reddit thread from r/dataengineering and in the comments of my own LinkedIn post.

Here’s what usually happens when this kind of migration takes place:

  1. Executive decisions on vendors made primarily on projected cost savings
  2. Even if migration is funded by "vendor credits", they run out long before completion
  3. Reality hits: the true cost includes engineering time, delayed projects, and accumulated technical debt
  4. Sunk cost fallacy keeps the migration going even when the math no longer makes sense

When leadership is too far removed from the day-to-day reality of engineering, these are the kinds of results you get. A platform migration isn't just copying files; it's rewriting integrations, retraining teams, and maintaining two systems during the transition.

Too often, the cost of migration isn't factored into these decisions. Even if you manage to trim your monthly spend by a bit, the migration cost can push your ROI years down the line.

That's why practitioners need to be involved by technical leaders when making platform decisions. Otherwise, you'll end up with losing money on unnecessary migrations, along with potentially losing some of your top talent.

Moving off Pandas—a migration that might be worth it?

After covering the reasons migrating tools is difficult, let’s talk about one that might be worth it.

If you've been working with data in Python, you've almost certainly built workflows around Pandas. It’s been a staple for data manipulation since 2008, but it’s not without its pain points (one might call them “Panda Pains”). Santiago Valdarrama recaps some of them for us in his LinkedIn post.

The issue with Pandas at the present? Its single-threaded architecture and memory inefficiency are showing their age. Modern data workflows demand speed that Pandas simply wasn't designed to deliver. 

Despite these limitations, Pandas maintains its crown for one compelling reason Santiago highlights: "Nobody wants to learn a new API." Let's be honest—we're all reluctant to rewrite existing codebases, retrain team members, or risk introducing new bugs into production pipelines that are (mostly) working.

Santiago introduces FireDucks as a potential solution—promising not only better performance but an easy migration process. Simply change a single import line (import fireducks.pandas as pd) or use an import hook (python -mfireducks.imhook yourfile.py).

This does present a good counterpoint as to when to actually switch platforms. Increasing pain/frustration along with lower barriers to switch make it much easier to say yes to a platform change.

Did we miss anything? Find my LinkedIn post to comment on the other interesting topics you’ve seen in the data world this past month.

Table of contents

    Tags

    We’re hard at work helping you improve trust in your data in less time than ever. We promise to send a maximum of 1 update email per week.

    Your email
    Ensure trust in data

    Start monitoring your data in minutes.

    Connect your warehouse and start generating a baseline in less than 10 minutes. Start for free, no credit-card required.