Perplexity, Google, and the End of Pre-training

Sunday, July 6, 2025 • 11 min read

Why Harvard’s fireside chat with Aravind Srinivas reshaped how I think about AI’s future - and what it means for Google, AGI, and us

You can watch the full episode above - honestly, it’s one of the best AI conversations I’ve come across in a long time. Aravind Srinivas (CEO and co-founder of Perplexity) sits down with Professor Jim Waldo at Harvard Innovation Labs to talk through what comes next after pre-training, why reasoning matters more than ever, and how Google might have scaled itself into a corner.

I watched it twice. And then things really started clicking.

If you’re building with AI, trying to make sense of what’s hype and what’s real, or wondering what makes Perplexity so different from Google - it’s well worth a listen.

The Podcast in a Nutshell

Here are the core takeaways from the chat:

The pre-training era is over. We’ve scraped the web, ingested every book, video, and Wikipedia page - and now there’s nothing left to feed the machine that it hasn’t already seen.
Post-training is where the real work starts. This next phase is about teaching AI to reason, not just recite - coaching it to follow logic, complete tasks, and learn from mistakes.
Open source is catching up fast. DeepSeek, for example, has built a reasoning-capable model using smarter training techniques and fewer resources.
Perplexity’s strategy is ruthlessly pragmatic. Instead of building from scratch, they started by wrapping existing models, building user data, and waiting for open source to level the playing field.
Google is now too big to pivot easily. Their scale, their ad model, and their reputation all make generative AI hard to adopt without hurting their own business.
AGI isn’t here - and probably won’t be. Srinivas (and I) both think we’re chasing the wrong thing.
Those who understand AI will thrive. Everyone else? Risks being left behind.

Auto-generated description: Three individuals are seated on stage engaging in a discussion during an event.

Glossary

A few terms worth knowing if you’re diving into this conversation:

Pre-training – The initial phase of building a large language model (LLM), where it ingests huge datasets (books, articles, websites) to learn general patterns in language.
Post-training – The newer, more nuanced phase where the model is refined, coached, and taught how to reason, complete tasks, and improve based on human feedback.
AGI (Artificial General Intelligence) – A still-theoretical AI capable of understanding and performing any intellectual task a human can do. We’re not there yet—despite what some headlines suggest.
LLM (Large Language Model) – The underlying model powering tools like ChatGPT and Perplexity, trained on massive amounts of text to generate language-based responses.
Perplexity AI – A search-focused AI assistant that delivers direct, cited answers by querying the web in real time—like Google Search with clarity and no ads.
AI Mode (Google) – A new generative AI overlay on Google Search results that attempts to answer questions directly, currently being tested in the US.
Open Source AI – Models like Mistral and DeepSeek that are released publicly, allowing others to inspect, improve, and build upon them without proprietary restrictions.
Synthetic Data – Data generated by AI to train other AI systems, often used when human-curated datasets are limited or expensive to obtain.

We’ve Run Out of Stuff to Train On

This was one of my favourite insights from the whole chat - something I’ve been thinking about a lot.

Pre-training is basically done. That part of AI’s growth, where we just shove trillions of words and pixels into a model and let it absorb common sense, is maxed out.

The internet’s been scraped. The books have been read. The videos have been watched. AI has seen what we’ve seen - and now, it knows “stuff.” But it doesn’t know it knows it. And it doesn’t know what it doesn’t.

It just repeats what it’s absorbed. It doesn’t question itself. It doesn’t reflect. It doesn’t course-correct. And that’s the problem.

From here on out, it’s about teaching AI, not just feeding it. Coaching it, correcting it, showing it how to behave when things go wrong. We have to teach it to reason, to react, to not take shortcuts. These are the traits that make human intelligence so powerful. We’re adaptable. We question ourselves. We learn from getting it wrong.

AI doesn’t do that unless we teach it to.

Sure, people will use synthetic data and have AI teach other AIs. But that’s not enough. The nuanced, real-world stuff, how to deal with ambiguity, what to do when you’re not sure, how to recover from a wrong answer - that needs humans in the loop.

That’s why post-training is where all the hard (and interesting) work is now.

Why Google’s Size Is Now Its Biggest Handicap

Let’s talk about Google’s new “AI Mode.”

It’s now live for everyone in the US. I’ve been testing it myself (via VPN - easy enough to do), and it’s… well, fine. It works. It answers questions. It doesn’t hallucinate too badly. But it’s cautious. Flat. It feels like something that’s been carefully nerfed to stay out of its own way.

And now, after listening to the Perplexity fireside chat, I get why.

This isn’t just a design choice. It’s a fundamental limitation of Google’s business model. Because here’s the uncomfortable truth:

If AI gives people direct answers, they won’t click links.
If they don’t click links, Google’s advertisers lose visibility.
If advertisers lose visibility, the entire Google revenue machine breaks down.

Generative AI is an existential threat to the very thing that makes Google money. That’s not dramatic - it’s just reality. So of course their AI Mode feels restricted. It has to be.

Meanwhile, Perplexity doesn’t carry that weight. They have no legacy business model to protect. No platform that relies on traffic, click-through rates, or sponsored positioning. They’re not trying to funnel you into the open web - they are the destination.

That difference is more than technical. It’s philosophical.

Why AI Mode is Applying a Plaster (Band-aid for Americans), and Not a Leap

The thing is, I don’t think Google is standing still. I’m sure they’ve got the talent, the compute, and the product teams to make something incredible. But the incentives are all wrong.

The ad model demands traffic. The AI model removes it. You can’t optimise for both.

That’s why AI Mode today feels more like a plaster (band-aid in America) than a breakthrough. It’s an attempt to keep up with what people expect from AI search - without fully committing to what that means.

They’re building an AI that tells you what you need to know… but also nudges you toward clicking something anyway. You get an answer, sure, but not quite enough of one. And that’s not because the model can’t do better. It’s because doing better would disrupt the model - the business model.

The Google We Grew Up With Is Gone

This is the part that’s hard to admit - but it’s true.

There was a time when Google was useful. You typed in a question, and you found exactly what you needed on the first page. It felt like the whole internet was suddenly at your fingertips - neatly ordered, de-duplicated, de-fluffed.

Now? It’s a swamp. Clickbait. Ads disguised as answers. Answers written by other AIs quoting each other in circles. It’s SEO games and affiliate farms. And unless you’re hyper-specific, you spend more time digging through junk than learning anything.

Perplexity feels like a reset. A moment of clarity. Like we’re back to when search worked - before it was monetised to death.

But we all know how this ends.

If we’re not careful, AI tools will go the same way. Tracked. Optimised. Jammed with sponsored results. Designed not to help, but to extract value from your attention.

And that’s why I keep saying: let’s enjoy this window while it lasts.

On AGI Hype: It’ll Happen on a Technicality

This part of the talk really hit home for me, because it echoed something I’ve felt for a while now.

We’re not going to wake up one day and see the news: “AGI Achieved.” That’s not how it works.

What’s going to happen - what’s already happening - is this: someone will build a model that’s incredibly good at doing a handful of high-value tasks. Writing strategy documents. Debugging code. Conducting research. Running simulations. And they’ll slap the AGI label on it.

That doesn’t make it AGI. It makes it useful. And frankly, that’s more important.

But here’s the bit that really gets me - we were way too quick to slap the “AI” label on what we’re calling AI today. We called everything “AI” before we even figured out what that should really mean. So now we’ve had to invent a new term - AGI - to describe the actual thing we were originally pointing to.

It’s like we overpromised and then quietly moved the goalposts. My fridge says it has “AI.” It doesn’t. It just learns when I open the door a lot and tries not to defrost my salad drawer. That’s not intelligence. That’s a glorified timer with some sensor logic.

And this is happening everywhere.

Most of what we call AI today isn’t intelligent. It’s predictive. It’s reactive. It’s trained pattern recognition. It can’t reason, or introspect, or make decisions based on uncertainty. It’s impressive, but it’s not self-aware - and certainly not general.

AGI: A Goalpost That Keeps Moving

The AGI label, like “cloud” or “web3” or “the metaverse,” is becoming more of a marketing term than a technical one. It gives investors something to chase and PR teams something to dangle.

But let’s be honest - even if someone did build something that passed every benchmark today, the definition would just get redefined:

“It can’t be AGI unless it understands context like a human.”
“It can’t be AGI unless it can love, or create art, or understand irony.”
“It can’t be AGI unless it writes a novel that makes me cry.”

We don’t need it to do any of that. What we need is AI that’s thoughtful, coachable, reliable, and aligned. We need tools that don’t just autocomplete our emails - but extend our thinking, support our workflows, and scale our impact.

That’s not “general” intelligence. That’s practical intelligence. And honestly, it’s already here - if you know how to work with it.

Let’s Ask a Better Question

So when it comes to all the great minds out there chasing AGI - or the clever marketing teams saying they’re chasing AGI to satisfy society’s never-ending appetite for “the next big thing” - I think we should stop and ask a better question.

One of my favourite films asks it perfectly:

“Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”
— Jurassic Park

Still, to this day, the best AI ethics quote ever written.

Auto-generated description: A man wearing sunglasses and a leather jacket is shown speaking, accompanied by the caption: Your scientists were so preoccupied with whether or not they could... they didn’t stop to think if they should.

Just because we can build something, doesn’t mean we’ve thought hard enough about how we’ll use it, who benefits, or what we might break in the process.

Explore the Podcast with an Interactive Agent

If you’re the kind of person who likes to dig a little deeper or ask your own questions, I’ve built something for you.

Thanks to Google’s NotebookLM, you can now interact with a custom-built agent trained specifically on the OpenAI podcast content. It understands the episode in detail and can help answer follow-up questions, explain concepts, or clarify anything you didn’t quite catch.

Try it here:
https://notebooklm.google.com/notebook/c963c185-3534-4e15-8318-6a5bde65eea0

To use it, you’ll need:

A Google account
To be over 18 (you can verify your age at myaccount.google.com/age-verification)

This is still experimental tech, but it works surprisingly well. Feel free to test it, explore ideas, or just see what else the podcast didn’t say out loud.

—-

Final Thought: Use This Era Wisely

Here’s where I’ve landed after watching the Harvard talk, testing AI Mode, and reflecting on where this whole thing is going:

We’re living through a rare window of clarity.

Perplexity is giving real answers - not links, not clickbait.
ChatGPT is still ad-free (though ads are coming - OpenAI have confirmed it).
AI products are still being designed for usefulness, not manipulation.
Open-source models are rising, getting better, and doing it in the open.

But this won’t last forever.

We’ve seen this story before. The open web got monetised. Social media became ad-driven echo chambers. Even early Google - arguably one of the best information tools ever built - eventually turned into a pay-to-play SEO jungle.

So if you’re building AI tools right now - build for people, not for click-throughs.
If you’re leading others through this space - teach them how to use AI wisely, not just quickly.
And if you’re exploring it for yourself - take your time. Get to know what’s under the hood. Understand what the model does well - and what it doesn’t.

Because that understanding? That’s what will protect you when the UIs change, the ads sneak in, and the magic starts to fade.

This moment we’re in?
It’s brief. It’s beautiful. And it’s worth making the most of - before someone tries to sell it back to us.