The AI Engineer Roadmap

This is Post 0 of an 8-part series where I break down AI engineering concepts — from embeddings to production systems — using practical examples and code.

I have a small confession.

For a long time, whenever someone said things like "we should use RAG for that" or "let's just fine-tune the model" in meetings, I'd just nod along.

Then later I'd Google it, skim a couple of blog posts, convince myself I understood it, and move on.

If that sounds familiar, you're not alone.

I've been a developer for a while. I know how to navigate codebases, build features, debug things when they inevitably break. Just normal software engineering stuff.

But when AI suddenly became the topic of every standup, product review, and tech discussion, I realized something: I could use the tools, but I didn't really understand what was happening underneath.

I could prompt my way through problems using ChatGPT or Claude. But if someone asked me:

How do embeddings actually work?
Why does RAG reduce hallucinations?
When should you fine-tune instead of improving your prompts?

...I'd probably struggle to explain it clearly.

That gap started bothering me.

So I decided to fix it.

Why I'm Writing This Series

This isn't another "AI is changing the world" series.

There are already thousands of those.

Instead, this is the series I wish existed when I started trying to learn AI engineering properly.

Each post will follow a simple format:

Explain the concept in plain English
Use a real-world analogy that actually makes sense
Show code that you can run and experiment with

The audience is developers and engineering managers who already build software but want to understand AI well enough to build with it, not just call an API and hope for the best.

The Roadmap

This series will have 8 posts.

The Foundations (Posts 1-4)

Post 1: How LLMs Actually Think
Tokens, embeddings, transformers, and context windows. The basic pieces that everything else depends on.

Post 2: RAG - Giving LLMs Your Knowledge
Why LLMs hallucinate and how retrieval-augmented generation helps solve that. We'll also build a simple RAG pipeline.

Post 3: AI Agents - Making LLMs Do Things
Moving from "answer this question" to "figure it out and complete the task." We'll explore tool use and decision loops.

Post 4: Fine-tuning — When Prompting Isn't Enough
When you should fine-tune a model (and when you definitely shouldn't).

The Advanced Track (Posts 5-8)

Post 5: Real-time AI Systems
Streaming responses, WebSockets, and building interfaces that don't make users wait forever.

Post 6: Choosing the Right Model
How to evaluate models for your use case instead of blindly picking whatever is trending that week.

Post 7: Production AI Systems
Caching, cost control, monitoring, and the things that actually matter once something goes live.

Post 8: Building an AI Product (Capstone)
Putting everything together into a working system.

The Project We'll Build

Theory is useful, but things only really click when you build something.

So throughout the series we'll build a single project that grows with each post: an AI-powered Code Review Bot.

Why a code review bot?

Because it naturally requires most of the concepts we'll cover.

In Post 1, we'll understand how the bot "reads" code using tokens and embeddings.

In Post 2, we'll add RAG so the bot can understand your codebase and coding standards.

In Post 3, the bot becomes an agent. It can receive a PR, decide what files to analyze, retrieve context, and generate feedback.

In Post 4, we'll explore whether fine-tuning makes sense for learning your team's coding style.

Then the advanced posts make it production ready.

Streaming feedback instead of waiting for one big response
Using different models depending on the task
Adding caching, monitoring, and cost tracking

By the end, we'll have a working system — and along the way you'll understand how each piece works.

How the Posts Connect

The order of topics is intentional.

First we understand tokens and embeddings. Then we use embeddings for RAG.

Once we have RAG, it becomes easier to build agents that retrieve information and take actions.

And finally we talk about fine-tuning, which answers the question: "What happens when prompting and RAG still aren't enough?"

The later posts focus on what happens after the prototype works — making it fast, reliable, and production-ready.

Who This Is For

You'll probably get the most value from this if:

You're a software developer
You use AI tools regularly but want to understand how they work
You want to build AI features into products
You prefer practical examples over academic explanations

You don't need an ML background.

If you can read code and understand API calls, you'll be fine.

What This Series Won't Cover

I'm not an ML researcher.

This series won't dive into training neural networks from scratch or designing new model architectures.

There are great resources for that already.

This series is about AI engineering - how developers can build real systems using existing models.

Share your feedback on Instagram (opens in new tab).

Let's Start

Post 1 comes next.

We'll start with the basics: tokens, embeddings, transformers, and context windows.

By the end of that post, you should have a clear picture of what actually happens when you type a prompt and press enter.

If you've ever felt like you're nodding along to AI discussions without fully understanding them — that's exactly why I'm writing this series.

Next up: Post 1 — How LLMs Actually Think