πŸ”¬ TAF Agent

Test ANY transformer LLM before you spend GPU/$.

βœ“ RoPE-MHA βœ“ RoPE-GQA βœ“ ALiBi βœ“ AbsPE βœ“ SWA βœ“ SSM (Mamba) βœ“ Any HuggingFace public model

All computation runs locally in your browser. Free. Unlimited. Auditable.

Built by an independent researcher. Open source. Not affiliated with any model vendor.

πŸ“˜ TAF Agent β€” User Manual

What does it do?

Predicts practical viability of any transformer LLM before you spend GPU/$. Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using deterministic Python formulas (TAF β€” Thermodynamic Attention Framework).

How to use β€” 7 modes

πŸ“‡ Profile: paste model id β†’ all recipes at once = TAF Card. Best starting point.

πŸ†š Compare: 2-3 models side-by-side on same recipe. Best when choosing between candidates.

πŸ” Inspect config: paste raw config.json β†’ tool parses + runs full Profile. For private models, in-development configs, or models not yet on HF Hub.

πŸ’¬ Ask plain English: free-form question, in-browser LLM picks the recipe. Best for casual exploration.

πŸ“‹ Recipe + form: manual selection, full parameter control. Best when you want exact control.

🩺 Diagnose CLI: generate Python command to measure Ξ³ on your local machine (transformers + numpy). Fast β‰ˆ5 min CPU; full β‰ˆ20–60 min GPU. Output JSON re-uploadable via Inspect.

πŸ“Š Phase diagram: scatter plot of 23 panel models on (log ΞΈ, Ξ³) plane. Hagedorn line Ξ³=1 separates Phase A from Phase B. Click a dot to load that model into Recipe form.

The 8 recipes available

X-1 Custom training vs API β€” compares cost of training your own model vs paying for API access.

Try: "Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"
Answer types: YES (custom) / NO (API) with break-even months.

X-2 Long Context Viability β€” predicts if a model serves a target context length reliably.

Try: "Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"
Chains: Ξ³_PadΓ© β†’ decomposition β†’ d_horizon β†’ NIAH ceiling β†’ hallucination β†’ KV memory.
Verdict: YES / DEGRADED / NO with mitigation if needed.

X-3 Budget pre-flight β€” given $ budget, what model is feasible to train?

Try: "I have $5000, what model can I train?"
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).

X-5 Hardware selection β€” which GPU should I use to serve at target throughput?

Try: "Cheapest hardware to serve Llama-3-8B at 10M tokens/day"
Answer: best GPU + $/Mtok + capacity vs target.

X-19 KV Compression decision β€” should I use soft decay, hard cutoff, or literature methods?

Try: "How to compress KV cache for Qwen2.5-7B at 32K?"
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.

β€” v0.4 (sesiΓ³n 29 findings) β€”

What's new in v0.4 (sesiΓ³n 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).

X-21 Imprint Purity Diagnostic β€” predicts Ξ³ on RANDOM tokens via Ξ½=βˆ’1/(2Ο€); how clean is the model's RoPE prediction?

Try: "How clean is the RoPE prediction on Llama-3-8B?"
Answer: predicted Ξ³_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).

Learned-imprint slope Ξ½ = βˆ’1/(2Ο€): RoPE rotation period 2Ο€ drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. Ξ½ is DERIVED β€” not fitted (empirical err 0.3%).

X-22 Compute-Context Invariant β€” does Ξ³ Γ— log(NΒ²Β·D) lie in panel band 51.2 Β± 16.8? Detects scaling/training anomalies.

Try: "Does Mistral-7B fit the compute-context invariant?"
Answer: K = Ξ³Β·log(NΒ²Β·D), z-score, IN-BAND or OUTLIER.

Chinchilla-attention invariant K: Ξ³ Γ— log(NΒ²Β·D) β‰ˆ 51.2 Β± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.

X-23 IH-Phase Detector β€” pre- or post-induction-head? Cheap probe via sign(Ξ³_text βˆ’ Ξ³_random).

Try: "Is Qwen2.5-7B post-induction-head?"
Answer: CONFIRMED PRE-IH / CONFIRMED POST-IH / ANOMALY (with size-vs-Δγ consistency check).

Δγ as IH probe: sign(Ξ³_text βˆ’ Ξ³_random) > 0 ⟺ post-induction-head. Cheaper than running an in-context-learning benchmark.

Ξ³-cluster on famous constants (intriguing, n=4): CodeLlama-13b Ξ³=0.382 β‰ˆ 1βˆ’1/Ο† (golden conjugate, err 0.0003); pythia-1.4b Ξ³=0.705 β‰ˆ 1/√2; Llama-2-7b Ξ³=0.287 β‰ˆ 1βˆ’1/√2; Mistral-Nemo Ξ³=0.428 β‰ˆ log_10(e). Caveat: could be coincidence.

πŸ†• v0.4 β€” New diagnostics (sesion 31)

Four new diagnostic functions derived sesion 31 (2026-04-30) from cross-of-crosses formula games + SΓ³cratic interrogation. Available in taf_browser.py Β§33.

Architectural Concentration β€” Ξ³_text β‰ˆ Ξ³_PadΓ© βˆ’ 0.012Β·n_kv. Cross-panel correlational law (RΒ²=0.30). Caveat: not per-model predictor.

PDI β€” PadΓ© Deviation Index β€” PDI = d_horizon_obs/T_eval. Traffic light: green (β‰ˆ1), orange (>>1), yellow (<<1), red (Phase B negative).

4-bit Shift Predictor β€” MHA: RΒ²(bf16)<0.9 β†’ Ξ³ rises; RΒ²>0.99 β†’ Ξ³ drops. GQA: precision-robust regardless.

Critical Exponents Bundle β€” Ξ½_c, Ξ²_c, Ξ·_c (=Ξ³βˆ’1, CORRECTED), Ξ±_C, Ξ³_susc with AM-GM minimum at Ξ³=1βˆ’1/√2β‰ˆ0.293.

Adding new models (3 ways)

The audit chain

Every result shows the full Computation Chain β€” each formula step with its inputs, output, and interpretation. Click any step to expand. Cite section numbers (Β§26.1, Β§19.1, etc.) refer to the underlying paper for derivation.

The plain-English answer

After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load) synthesizes a plain-English summary. The numbers above are always correct (deterministic Python); the synthesis is LLM-generated β€” verify against the chain if in doubt.

Common parameters explained

What to look for in verdicts

Privacy

Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.

Source & paper

Source code: github.com/karlesmarin/tafagent
Paper: Marin 2026 β€” Predicting How Transformers Attend (Zenodo; arXiv forthcoming)
Dataset: taf-attention-decay β€” 58 Ξ³-measurements across 32 models (CC-BY-4.0)

⏳ Loading Python runtime...

🎯 Mode Four ways to use the tool.
πŸ“‡ Profile: paste a model id β†’ all 5 recipes at once = TAF Card.
πŸ†š Compare: 2-3 models side-by-side on one recipe.
πŸ’¬ Ask: free-form question, browser LLM picks the recipe.
πŸ“‹ Recipe: manual selection with full form control.

Quickest start: paste any HuggingFace model id (e.g. meta-llama/Meta-Llama-3-8B), click Profile. See all 5 recipes scored in seconds.

πŸ’‘ Quick start: pick any preset β†’ click Generate. Or paste a model id from HF Hub trending β†’ πŸ“₯ Fetch β†’ Generate.

πŸ“‡ Profile a model One-click full diagnosis. Paste any HF model id (or pick preset). Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget, hardware) and produces a single TAF Card showing verdict per dimension + key numbers + architecture classification.

Use case: "I'm evaluating Qwen2.5-32B for production β€” what's its full viability profile?" β†’ paste id β†’ Profile β†’ done.

For technicians: when you need a complete viability snapshot of a candidate model. Outputs match paper Β§sec:gamma_decomposition format.

πŸ“‚ Import a shared TAF result

Got a JSON file from someone else's TAF analysis? Load it here to see the verdict + chain locally. Same view as if you'd run it yourself.

🌐 Recent community submissions

Live feed from the public registry. Click any submission to view full analysis. Browse all β†’

Loading...

πŸ”¬ Paper predictions β€” falsification status

The TAF framework rests on falsifiable predictions (F1-F23). Each is empirically tested. Here's the live status of every prediction in the paper.