46  R9 · tafagent manual

What it is. The manual for tafagent, the in-browser diagnostic tool that accompanies this book. Here are its modes, what metrics it computes, the recipes, and how to read its verdicts. The key point up front, no misunderstandings: tafagent does NOT draw attention maps —it is a diagnostic that PREDICTS metrics from the model’s config—. To see maps, use BertViz or Transformer Explainer; to measure/predict γ, horizon, regime, and KV, use tafagent.

46.1 What it is and what it isn’t

  • It is: a tool in the browser (zero install, zero GPU, no telemetry) that predicts the practical viability of an LLM before you spend GPU/€: real long context, quantization degradation, chat template, config errors.
  • It isn’t: an attention-map viewer, nor a service that runs the model. What you see are deterministic predictions (computed with Pyodide) plus a natural-language layer.
  • Input: an HF model id or a config.json (it reads θ, T_train, heads…); you set T_eval (the target length) and whether the model uses a sliding window (SWA).

46.2 The metrics it computes

Metric What it tells you Ch.
γ_Padé γ predicted from geometry (θ, T) 15
γ_observed γ measured from real weights 15
d_horizon effective attention horizon (how far it really attends) 15, 19
η (θ_eff_obs/θ_eff_Padé) regime: Normal / Fraud / Compressed / Over-Padé / SWA 16
KV memory cache memory at length L 20, 36
L_NIAH estimated needle-in-haystack ceiling 19
Δγ phase probe for induction heads 24, 30
ΔPPL perplexity shift from quantization 35
Phase A / Phase B γ<1 (global) vs γ>1 (local collapse) 21

46.3 The 7 modes

  1. 📇 Profile. Paste a model id → γ_Padé vs γ_observed, R², regime, horizon. The starting mode.
  2. 🆚 Compare. Pits two models against each other on the same axes.
  3. 🔍 Inspect config. Reads and explains the config.json (θ, heads, SWA…).
  4. 💬 Ask plain English. Ask in natural language (handled by a small model in-browser).
  5. 📋 Pick recipe. Choose an X-* recipe (below).
  6. 🩺 Diagnose CLI. Command-line-style diagnosis.
  7. 📊 Phase diagram. Places a panel of models on the γ axis (Phase A/B) — the atlas interactive.

46.4 The Anti-Bullshit Pack (15 tools)

Diagnostics that attack the “smoke” of model cards: Context Unmasker (real long context vs advertised), Chat-template Sniffer, Quant-regime Classifier, Multilingual Tokenizer Tax (tokenizes real text in 6 tokenizers → how much your language “costs”), Contamination Prior, LongScore (RULER+HELMET), PEFT Anti-Pattern, Spec-Decode, plus extension: YaRN/RoPE planner, GGUF Bridge, Launch-Flag Generator.

46.5 The recipes (8 core)

Recipe For Ch.
X-1 custom vs API 25, 36
X-2 long-context viability 19
X-3 budget pre-flight 25
X-5 hardware 36
X-19 KV compression (soft-decay/cutoff) 20
X-21 imprint purity 15
X-22 compute-context invariant 34
X-23 induction-head phase detector 24, 30

46.6 How to read the output: the TAF Card

The result is summarized in a TAF Card with ✅ / ⚠ / ❌ verdicts per dimension (context, quantization, template, regime). And there is a falsification dashboard (F1-F23): it doesn’t just hand you numbers, it puts the claims to the test —the same philosophy as Ch. 38—.

A typical flow: Profile (paste the id) → read the verdict on the TAF Card → if something comes back ⚠/❌, open the specific diagnostic (e.g. Context Unmasker or Quant-regime) → use the matching recipe (X-2 context, X-19 KV) to decide.

46.7 Verification (the receipts)

  • Pyodide for the math (deterministic), WebLLM for the natural language, transformers.js for the tokenizer.
  • Lean+Mathlib verification: 15 identities formally proved (github.com/karlesmarin/lean-taf).
  • Panel of 23 models open (github.com/karlesmarin/tafagent-registry).
Warning⚠ Honest — what is prediction and what is measurement

Many outputs (γ_Padé, estimated horizon, KV at length L) are predictions from the config —fast and useful for deciding before spending GPU—, not measurements of real attention. Once you have the model, contrast them with γ_observed (measured) and with your own measurement (R4). And remember the book’s limits: D_f and the context headroom are 🟡 rules not fully validated (R1).

Next reference (R10): the suggested solutions to each chapter’s exercises.