1  How to use this guide

Start here. This short chapter explains who the guide is for, how it is organized, what its badges mean (the coloured marks you’ll see everywhere), and where to start reading depending on what you’re after. Three minutes that will save you time across the other forty chapters.

1.1 Who it’s for

For anyone who works with transformers and wants to understand them for real, not just use them: students, engineers, researchers. We start from scratch (what a token is) and reach the 2026 frontier. You need nothing more than high-school mathematics and curiosity: every formula is explained term by term, in plain language, and with an analogy before the equation.

1.2 What makes it different

There are excellent transformer textbooks. This one adds what no other has:

  • Attention across distance (Part II): an atlas of γ measured on 42 models, the decay law \(A(d)\propto d^{-\gamma}\), and the KV window derived from it —data and theory you won’t find in any other guide.
  • A physical lens (Part III): thermodynamics, phases, and fractional transport of attention, honestly labelled (derived vs. analogy).
  • A formula audit (Part VII): verified vs. folklore vs. numerology, with formal proofs in Lean and open data —including our own errors and refutations.
  • An interactive tool (tafagent) and the map of the 2026 landscape.

1.3 How it’s organized

Part What it covers
0 This orientation.
I — Foundations (1-12) From text to a working transformer: tokens, embeddings, attention, FFN, normalization, RoPE, the block, architectures, training, inference.
II — Attention over distance (13-20) Our territory: reading maps, aliasing, the γ law, the atlas, sinks, taxonomy, long context, KV-cache.
III — Physical lens (21-24) Phases, thermodynamics, fractional dynamics, grokking.
IV — Training and adapting (25-28) Pre-training at scale, fine-tuning, alignment (RLHF/DPO), PEFT (LoRA).
V — Using (29-33) Generation, prompting, RAG, agents, multimodal.
VI — Efficiency (34-36) Efficient attention, compression, serving.
VII — What’s real (37-40) Interpretability, the audit, the 2026 map, ethics.
VIII — Reference (R1-R10) Formula sheet, cookbook, glossary, measuring your γ, the tafagent manual, solutions.

1.4 The badges: how to read the coloured boxes

The book uses coloured boxes with a consistent code. Learn it once (you’ll see a real example of each in any chapter):

  • 🧭 Where we are (blue box) — opens each chapter; places you on the map.
  • 🧩 Analogy (light-blue box) — the bridge to intuition, before the formula.
  • ✓ Verified (green box) — something solid, derived or proved; in the physical theory, 📐 Verified in Lean marks a formal proof.
  • ⚠ Honest (amber box) — a limit, a caveat, or a refutation (including our own).
  • 🧠 Curiosity — a surprising, verified fact.
  • 🧪 Try it — tafagent — how to check it yourself with the tool (with a link).
  • 📄 Our paper — an open link (Zenodo) to the data/derivations that back it.

And, above all, the receipts system of Ch. 38: every claim is verified (with a proof or data), folklore (popular but unjustified), or numerology (fits with no mechanism). The reader’s rule: ask for the receipt.

1.5 The honesty commitment

A single idea runs through the whole book: honesty is not proclaimed, it is demonstrated. That means you’ll see formulas that failed, our own errata corrected (the C_V was /4, it’s /12), and results flagged as 🟡 not validated or ✗ refuted. This is not a flaw: it’s the method. A textbook that only tells you its successes isn’t to be trusted.

1.6 Where to start reading (itineraries)

  • If you’re starting from scratch: read Part I in order; it is linear and self-contained.
  • If you already know the basics and want what’s unique: jump to Part II (attention over distance) and III (the physical lens) —the book’s own heart.
  • If you’re here to apply: Parts IV-VI (train, use, deploy) + the cookbook (R2).
  • If you’re here to look things up: Part VIII —formula sheet (R1), glossary (R3), recipes (R2)— is meant for searching, not reading cover to cover.
  • If you care about “what’s real”: Part VII (interpretability, the audit, the 2026 map, ethics).

1.7 Conventions

  • The formulas stay —they matter— but always explained: what each symbol does, in plain language, and what role it plays in the transformer.
  • Each chapter opens with “The idea in one sentence” and a Key concepts section that defines its terms and their role before developing them.
  • References go at the end of each chapter (only the ones it cites), clickable.
  • To see attention maps, use BertViz / Transformer Explainer; to measure/predict γ, horizon, regime, and KV, use tafagent.

Next (Chapter 1): the panorama —what a transformer is and why it won.