Observable Pipeline

How LLMs Find Answers

Step inside the transformer architecture — from raw text to generated tokens, every layer made visible.

Click a step below to illuminate the corresponding layer

Tokenization

Text → Token IDs

text → [id₀, id₁, …, idₙ] ∈ ℤ⁺

LLMs never read raw characters. Before anything else, your text is split into sub-word tokens using Byte-Pair Encoding (BPE) — the same algorithm GPT-4 uses. A word like "backpropagation" may become ["back", "prop", "agation"] depending on how common those sub-strings are in training data. Token IDs are just integers indexing a vocabulary of ~50 000 entries.

GPT-4 vocabulary: ~100,000 tokens

"Hello world" = 2 tokens

"Backpropagation" ≈ 3–4 tokens

Token count drives API cost

Live Tokenizer

NInsideN shows the observable application-level pipeline — tokenisation, retrieval, context assembly, and streaming. The model's private internal activations and weights are not accessible.