Data vibe coding, SQL etiquette, and how to land a data engineering gig

Welcome to ‍—a monthly roundup of news you can use from around the data world.

Kevin Hu, PhD

and

Will Harris

March 26, 2025

Kevin Hu, PhD

Co-founder / Data and ML

Will Harris

Writer / Data

March 26, 2025

Data vibe coding, SQL etiquette, and how to land a data engineering gig

The data world is, as you might expect, chronically online. It’s also pretty scattered. There are tons of great conversations happening across Reddit, Twitter/X, Substack, Medium, and yes, even LinkedIn.

So, in the spirit of data engineering, here’s an attempt to take information from multiple sources, funnel it to a central location, and format it for you to read and enjoy. Welcome to Overheard in data, March ‘25 edition.

Data and vibe coding

I’m not exactly sure what to even call vibe coding. A hobby? A discipline? A method? While its effectiveness is still up for debate, there’s no doubt it’s pretty fun and a quick way to get a concept from idea to minimum viable product.

While most of the vibe coding projects we’ve seen so far have been in the software world, Jacob Matson’s question raised the question of how it translates to data.

Can one “vibe code” a data pipeline?
— Jacob Matson (@matsonj) March 4, 2025

The vibe coding trend raises interesting questions for data engineering—particularly around how much engineers should rely on LLMs for production-ready code.

Why this matters for data teams:

1. SQL queries generated by LLMs often 𝘭𝘰𝘰𝘬 correct but can silently introduce errors, especially with complex transformations or edge cases

2. When data engineers don't fully understand their pipelines, debugging becomes challenging when something breaks

3. The path of least resistance is tempting and there are genuine efficiency gains to be had

The most effective data engineers I know are finding the sweet spot: using AI to accelerate routine tasks (especially via tab completion) while deepening their understanding of core systems. They can explain every critical pipeline component, even if an LLM helped draft it.

I hope we can continue to toe the line, where GenAI tools augment our capabilities rather than replace our expertise. Expect the teams who successfully thread this needle to start pulling ahead of the pack.

Data as a mold problem

On LinkedIn, Stephen Bailey gave a really thoughtful framework for analyzing problems in terms of either:

Rust logic: Treating systems as mechanical, with aging/wearing/corrosion as primary risks‍
Mold logic: Treating parasitic/invasive growths as primary risks

I love how this perspective reframes our understanding of data issues: "Bad data is like mold spores; flexible schemas are like unsealed surfaces. Once they get in, it's actually unknowable where they might end up." Dashboard graveyards aren't just cluttered—they're decomposing feeding grounds for data parasites.

Rather than attempting complete sterilization, Bailey suggests "managed overgrowth"—treating data management like gardening. Allow some natural decay, limit growth in specific areas, and replace compromised systems with hardier varieties.

This approach shifts what makes a 10x data engineer away from just technical prowess, and more toward ecosystem awareness—understanding how bad data infiltrates and spreads throughout your systems.

Why are we all screaming in SQL?

There’s not a lot of insight to add to this one, but this made me laugh more than any other data engineering-related post did all month.

Why are SQL devs always yelling? pic.twitter.com/jVdhbZp8oz
— Alberta Tech (@albertadevs) March 14, 2025

What’s your SQL style—all caps? All lowercase? Somewhere in between?

Why you aren’t landing a DE job (and how to land one)

One of the most common topics of conversation across r/DataEngineering and beyond is how to land a job in data engineering—whether you’re a fresh grad or switching from an adjacent profession.

Why you aren't getting a DE job
byu/pawtherhood89 indataengineering

One Reddit user put out some reasons why they think you might now be landing the gig—the chief reasons being:

We’re seeing fewer entry-level roles because LLMs perform much of that work
Data engineering never really was an entry-level role, but more of something people stumbled into
You live outside of a major city and, therefore, your options for working somewhere hiring DE roles are limited
You need to work on your people skills (maybe it’s all the SQL screaming?)

I also came across some help for those of you looking to nail a data engineering interview.

Here’s everything you need to know to pass the SQL data engineer interview in big tech! pic.twitter.com/Af10XIUehP
— Zach Morris Wilson (@EcZachly) March 11, 2025

Big thanks to Zach Morris Wilson on Twitter/X, who put out a concise but very valuable cheat sheet for passing your SQL interviews. It includes practical SQL tips like when to use specific functions over others, as well as soft skill tips like asking questions about the assignment and being able to articulate your rationale.