LLM4N: A DataPeCharcha Series

Section I: The Ancestors of LLMs

WEEK 1

Before AI: Counting Words with N-Grams

We start at the very beginning, exploring how early models predicted text by simply counting which words appeared together most often. It's the simple, statistical foundation for everything that followed.

WEEK 2

Giving Words Meaning with Neural Networks

This week, we see the first spark of true 'understanding.' Learn how we taught computers that 'dog' and 'puppy' are related, and how RNNs tried to give models a basic memory.

WEEK 3

Fixing a Broken Memory with LSTMs

Early neural nets were forgetful. Discover the clever 'gate' system in LSTMs that gave models a reliable long-term memory, a breakthrough that dominated NLP for years.

Section II: The Transformer Architecture

WEEK 4

How AI Reads: The Art of Tokenization

Computers don't read words, they read numbers. We'll break down the essential process of 'tokenization'—turning text into a language machines can finally understand.

WEEK 5

Setting the Stage: Embeddings & Position

Before the magic happens, we have to prepare the data. Learn how a Transformer gives each word its initial meaning and, crucially, a 'timestamp' so it knows the order of the sentence.

WEEK 6

The Core Idea: How Self-Attention Works

This is the revolutionary concept that changed everything. We'll use simple analogies to explain how words 'talk' to each other to figure out the true context of a sentence.

WEEK 7

Upgrading to Multi-Head Attention

One 'conversation' between words isn't enough. See how Multi-Head Attention allows words to look at the sentence from many different perspectives at once for a richer understanding.

WEEK 8

Thinking Deeper with Feed-Forward Networks

After words gather context, they need time to 'think.' That's the job of the Feed-Forward Network, a key component that processes the information gathered by attention.

WEEK 9

Making it All Work: Residuals & Layer Norm

Stacking these layers creates problems. Discover the simple but brilliant tricks—'shortcuts' and 'stabilizers'—that allow us to build incredibly deep and powerful models without them breaking.

Section III: The Learning Process

WEEK 10

The Training Game: Learning to Predict

How does an LLM learn from the entire internet? We'll cover the simple goal of 'predicting the next word' and the core training loop that makes it all possible.

WEEK 11

Teaching the Model to Follow Instructions

A raw model is just a text predictor. Learn about Supervised Fine-Tuning (SFT), the process of training the model on examples to turn it into a helpful instruction-follower.

WEEK 12

Making it Safe & Helpful with RLHF

How do we make sure models are helpful and harmless? We'll explore RLHF, the advanced technique that uses human preferences to align the model's behavior with our values.

WEEK 13

How AI Writes: The Safe & Simple Ways

Once the model has learned, how does it choose its words? We'll look at the straightforward, predictable methods that form the basis of text generation.

WEEK 14

How AI Writes: The Creative Methods

To sound less robotic, models need a bit of randomness. Learn about the clever sampling techniques that allow for more creative, diverse, and human-like writing.

Section IV: Modern Architectures

WEEK 15

Working Smarter: Mixture of Experts (MoE)

Making models bigger is expensive. Discover the Mixture of Experts (MoE) architecture, a clever 'committee of specialists' approach that allows models to scale to trillions of parameters efficiently.

WEEK 16

The Future is Multimodal: Seeing & Hearing

The journey ends by looking ahead. We'll explore how LLMs are breaking the text barrier to understand images, audio, and video, getting closer to a human-like understanding of the world.