The Rise of LLMs and Why It Feels Like
Coming Home

Published November 2025 · 6 min read

By Kimberly Shenk · Co-Founder and CEO, Novi

After more than fifteen years working in data starting as a data scientist and Captain in the U.S. Air Force, through my research at Draper Laboratory, and then leading Data Science teams at tech companies like Eventbrite, this new wave of large language model (LLM) adoption feels deeply personal. Because for all the hype about LLMs, what’s actually happening isn’t new. It’s the same story I’ve lived my whole career: the data matters more than the model.

What the Air Force Taught Me

When I joined the Air Force, data wasn’t sexy. We were working with messy, incomplete, often siloed datasets trying to inform decisions that had real human consequences. One mission in particular sticks with me: supporting a research operation to Antarctica (part of what the military calls Operation Deep Freeze). The flights out of Hawaii usually took a C-130 fully loaded with personnel and equipment toward Antarctica. But nearly 70% of missions “boomeranged” (meaning they took off, the weather turned bad mid-flight and they had to turn back). That was expensive both in fuel and time, and sometimes the margin for error was razor-thin.
‍
We were tasked with building weather prediction models to significantly reduce the number of boomerangs. Data verification was baked into the mission: historical flight logs, layered with weather forecasts and runway-condition reports. We spent months collecting, cleaning, structuring, and validating years of data before building predictive “go/no-go” rules. Every false negative or mis-forecast could mean millions of dollars.
‍
Here’s the parallel: today, businesses incorporating and leveraging AI are facing the same kind of invisible risk. The model (the part everyone sees) is just the tip of the iceberg. It’s the unglamorous but mission-critical work of verifying, structuring, validating the inputs (your data) that determines whether you land, or you boomerang. If your data is wrong/unverified, the AI will still spit something back and it’ll just be garbage in, garbage out. It will cost you millions.

I Almost Quit MIT Over This Data Problem

Later, during my thesis at the Massachusetts Institute of Technology (MIT), predicting heart attacks and congestive-heart-failure events from health-insurance claims data, I spent roughly 98% of my time cleaning and organizing the data.
‍
Why?
‍
One day, well into my research, I discovered a critical data error. One of the fields I was using as a “predictor” had miscoded values across tens of thousands of records (a billing-code field that had been reused without versioning). It was subtle: the distribution looked okay at first glance, but the moment I drilled in, I spotted an abnormality that meant I had built a model that was partially fitting on a bad proxy. So instead of predicting true health outcomes, it was predicting patterns caused by the data error.
‍
When I fixed it, by diving back into the data and feature engineering, the model performance dropped by 10 points but behaved far more realistically on hold-out data. That fix saved the research. I nearly walked away from the whole project when the error surfaced (hence the “I almost quit” moment). But I pushed through and learned one of the biggest lessons of my career: you cannot outrun bad data with a better model.
‍
Today, brands and companies rushing into AI are making exactly that mistake. They gloss over data and go straight to looking at model outputs and searching for the silver bullet. If you feed the AI junk, it learns about your junk, and then gives you the junk back.

It’s Still 80% Unsexy Work, And That’s a Good Thing

Over the years I’ve been saying (whether you call it data science, machine learning, or now AI) the fundamentals are identical: cleaning, transformation, feature engineering underpin everything. The flashy model comes last. I even uncovered a blog post I wrote 10 yrs ago “Building a Model Is the Least Important Part of Your Job.” Back then, it felt contrarian. Today, with ChatGPT and other LLMs taking center stage, it feels like a full-circle moment.

The companies that win in this moment are the ones doing the work others ignore. Because data quality is your competitive advantage.

With LLMs such as ChatGPT and others gaining momentum, the interface may look different, but the rule hasn’t changed, and if anything, the stakes are higher. AI systems are so intelligent they learn extremely fast. But if your data is garbage, the results will just amplify the garbage.

Why This Moment Feels Like the Culmination

Now, as a founder and data advocate, I’m more excited than ever. I’ve seen the journey: the Air Force cockpit, the lab coat, the tech product team, the late-night data wrangling. All along the through-line has been a love of data. What it can do, how it should be treated, why it matters. To me, this is the moment when the infrastructure I’ve cared about finally gets the spotlight it deserves.
‍
I’m driven by a conviction that doing the unglamorous work is the real differentiator. In this moment, I get to help teams adopt strategies that honor rigor and trust: building data foundations that are robust, disciplined, auditable. Because the models may change tomorrow, but the work of data integrity and quality will always matter.