The price of a token, from first principles.

A live calculator and visual guide to inference economics on real GPUs. Same physics every provider faces; same numbers, surfaced.

Default scenario: Llama-3-70B FP8 · 2× H100 · 32 concurrent.

"What is photosynthesis?"500 input tokens · 79 ms prefill (scenario)What·is·photosynthesis?text → 6 sub-word tokens · note "photosynthesis" splitsTransformer layer× 80AttentionQ · K · VFFNup · down~0.9 GB / layer · 3.35 TB/s · streamed every tokenVRAM · 160 GB · 2× H100fits ✓Weights 70 GBKV 5.2 GBActivations 8 GBFree 76.8 GBPlants·convert·sunlight·into·energy.500 output tokens · 40 tok/s · 12.5 s decodeRequest timeline · wall-clock × computePrefill79 ms · 95%↑ compute idle 99% of the time(weights stream from HBM, every token)Decode12.5 s · 1%(visual scale compressed from honest 1:158 ratio)$0.0173per request× 0 reqs$1.08/M out$0.22/M in

Hover or tab through any band to see what it does.