β RoPE-MHAβ RoPE-GQAβ ALiBiβ AbsPEβ SWAβ SSM (Mamba)β Any HuggingFace public model
All computation runs locally in your browser. Free. Unlimited. Auditable.
Built by an independent researcher. Open source. Not affiliated with any model vendor.
π TAF Agent β User Manual
What does it do?
Predicts practical viability of any transformer LLM
before you spend GPU/$. Answers questions like "will this model work at L=32K?" or
"should I train custom or use API?" using deterministic Python formulas (TAF β Thermodynamic Attention Framework).
How to use β 7 modes
π Profile: paste model id β all recipes at once = TAF Card. Best starting point.
π Compare: 2-3 models side-by-side on same recipe. Best when choosing between candidates.
π Inspect config: paste raw config.json β tool parses + runs full Profile. For private models, in-development configs, or models not yet on HF Hub.
π¬ Ask plain English: free-form question, in-browser LLM picks the recipe. Best for casual exploration.
π Recipe + form: manual selection, full parameter control. Best when you want exact control.
π Phase diagram: scatter plot of 23 panel models on (log ΞΈ, Ξ³) plane. Hagedorn line Ξ³=1 separates Phase A from Phase B. Click a dot to load that model into Recipe form.
The 8 recipes available
X-1 Custom training vs API β compares cost of training your own model vs paying for API access.
Try: "Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"
Answer types: YES (custom) / NO (API) with break-even months.
X-2 Long Context Viability β predicts if a model serves a target context length reliably.
X-3 Budget pre-flight β given $ budget, what model is feasible to train?
Try: "I have $5000, what model can I train?"
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).
X-5 Hardware selection β which GPU should I use to serve at target throughput?
Try: "Cheapest hardware to serve Llama-3-8B at 10M tokens/day"
Answer: best GPU + $/Mtok + capacity vs target.
X-19 KV Compression decision β should I use soft decay, hard cutoff, or literature methods?
Try: "How to compress KV cache for Qwen2.5-7B at 32K?"
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
β v0.4 (sesiΓ³n 29 findings) β
What's new in v0.4 (sesiΓ³n 29 findings 2026-04-28): three diagnostic recipes derived from cross-model panel analysis (n=22 LLMs).
X-21 Imprint Purity Diagnostic β predicts Ξ³ on RANDOM tokens via Ξ½=β1/(2Ο); how clean is the model's RoPE prediction?
Try: "How clean is the RoPE prediction on Llama-3-8B?"
Answer: predicted Ξ³_random + purity diagnostic (CLEAN / OVER-IMPRINTED / UNDER-IMPRINTED).
Learned-imprint slope Ξ½ = β1/(2Ο): RoPE rotation period 2Ο drives a positional bias on weights, proportional to log(N_params). Even random tokens show this scaling. Ξ½ is DERIVED β not fitted (empirical err 0.3%).
X-22 Compute-Context Invariant β does Ξ³ Γ log(NΒ²Β·D) lie in panel band 51.2 Β± 16.8? Detects scaling/training anomalies.
Try: "Does Mistral-7B fit the compute-context invariant?"
Answer: K = Ξ³Β·log(NΒ²Β·D), z-score, IN-BAND or OUTLIER.
Chinchilla-attention invariant K: Ξ³ Γ log(NΒ²Β·D) β 51.2 Β± 16.8 (CV=0.329). Connects compute scaling and attention exponent into a single dimensionless number.
X-23 IH-Phase Detector β pre- or post-induction-head? Cheap probe via sign(Ξ³_text β Ξ³_random).
ΞΞ³ as IH probe: sign(Ξ³_text β Ξ³_random) > 0 βΊ post-induction-head. Cheaper than running an in-context-learning benchmark.
Ξ³-cluster on famous constants (intriguing, n=4): CodeLlama-13b Ξ³=0.382 β 1β1/Ο (golden conjugate, err 0.0003); pythia-1.4b Ξ³=0.705 β 1/β2; Llama-2-7b Ξ³=0.287 β 1β1/β2; Mistral-Nemo Ξ³=0.428 β log_10(e). Caveat: could be coincidence.
π v0.4 β New diagnostics (sesion 31)
Four new diagnostic functions derived sesion 31 (2026-04-30) from cross-of-crosses formula games + SΓ³cratic interrogation. Available in taf_browser.py Β§33.
Critical Exponents Bundle β Ξ½_c, Ξ²_c, Ξ·_c (=Ξ³β1, CORRECTED), Ξ±_C, Ξ³_susc with AM-GM minimum at Ξ³=1β1/β2β0.293.
Adding new models (3 ways)
Preset list: 11 popular models curated. Just select from dropdown.
HF Hub fetch: paste any model id (e.g. Qwen/Qwen2.5-32B-Instruct),
click π₯ Fetch. Browser downloads config.json directly from HuggingFace, fills the form. Works for any public model.
Manual: fill the form fields directly with values from the model card.
The audit chain
Every result shows the full Computation Chain β each formula step with its inputs,
output, and interpretation. Click any step to expand. Cite section numbers (Β§26.1, Β§19.1, etc.) refer
to the underlying paper for derivation.
The plain-English answer
After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load)
synthesizes a plain-English summary. The numbers above are always correct (deterministic Python);
the synthesis is LLM-generated β verify against the chain if in doubt.
Common parameters explained
ΞΈ (rope_theta): RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).
T_train: max context the model was trained on. From max_position_embeddings.
T_eval: your target inference context length. The key knob.
n_kv_heads < n_attention_heads: model uses GQA (Grouped Query Attention). Reduces KV memory but pushes Ξ³ toward Hagedorn.
has_SWA: model uses Sliding Window Attention (Mistral, gemma-2).
n_params: total parameter count. Threshold ~400M for induction-head emergence.
What to look for in verdicts
YES / GO β proceed with confidence; numbers support the choice.
DEGRADED / TINY-MODEL β works but with caveats; read the action.
NO / MEMORY-LIMITED β don't proceed as-is; mitigation provided.
Privacy
Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model
runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.
π― ModeFour ways to use the tool. π Profile: paste a model id β all 5 recipes at once = TAF Card. π Compare: 2-3 models side-by-side on one recipe. π¬ Ask: free-form question, browser LLM picks the recipe. π Recipe: manual selection with full form control.
Quickest start: paste any HuggingFace model id (e.g. meta-llama/Meta-Llama-3-8B),
click Profile. See all 5 recipes scored in seconds.
π‘ Quick start: pick any preset β click Generate. Or paste a model id from HF Hub trending β π₯ Fetch β Generate.
π Profile a modelOne-click full diagnosis. Paste any HF model id (or pick preset).
Tool runs all 5 recipes (long-context, KV-compression, custom-vs-API, budget,
hardware) and produces a single TAF Card showing verdict per
dimension + key numbers + architecture classification.
Use case: "I'm evaluating Qwen2.5-32B for production β
what's its full viability profile?" β paste id β Profile β done.
For technicians: when you need a complete viability snapshot
of a candidate model. Outputs match paper Β§sec:gamma_decomposition format.
π‘ Use case: you have a private model not on HF Hub, or a config you're designing. Paste the raw JSON below and get a full TAF profile.
π Architecture InspectorPaste any config.json directly. Tool parses it and runs the full Profile.
Useful for: private models, in-development configs, models not yet on HuggingFace,
or comparing what your custom architecture would do.
Paste the raw config.json contents. The tool extracts the architectural
parameters and runs the full 5-recipe Profile.
π‘ Try: paste 3 popular 7-8B models (Meta-Llama-3-8B, Mistral-7B-v0.1, Qwen/Qwen2.5-7B), pick recipe X-2, T_eval=16000. See which best handles long context.
π Compare models side-by-sideSame recipe, multiple models. Pick 2-3 candidate models and
one recipe. See verdicts in a single comparison table.
Use case: "I need long-context retrieval at 16K β which is
best: Llama-3-8B, Mistral-7B, or Qwen-7B?" β pick 3 + X-2 + 16K β see winner.
For technicians: when choosing between 2-3 candidate models for
a specific deployment scenario. Compare their verdicts on the same recipe.
For X-2 / X-19 only. The context length all compared models will be
evaluated at. Other recipes use their own params.
Output: Ξ³_obs, RΒ², phase, KV cache budget D_90, KL anomaly,
full thermodynamic profile (Z, U, S, F, C_V, Ο). Saved as JSON.
Pick options below and copy-paste the generated command on your local
machine (Python + transformers + numpy). Total wall time β 5 min in
--fast mode on CPU; full mode 20β60 min on GPU.
Generated command:
Next steps:
(1) git clone https://github.com/karlesmarin/tafagent
(2) cd tafagent && pip install torch transformers numpy
(3) Run the command above.
(4) Result JSON lands in ./diagnose_results/ β upload it
to the π Pick recipe mode (or paste in π Inspect config) for full TAF analysis.
π Phase diagram (Ξ³ Γ ΞΈ)
Each dot is one model from the paper's empirical panel
(data/master_gamma_results.json). The x-axis is RoPE base ΞΈ
on log scale; y-axis is measured Ξ³.
The Hagedorn line Ξ³=1 separates Phase A (Ξ³<1, global) from
Phase B (Ξ³>1, local-collapsed).
Hover dots for details; click to populate the recipe form.