Token IDs
Text becomes integers from a fixed vocabulary; no compute, no weight bytes.
Read more
The model never sees raw text. The tokenizer maps a string into integer IDs from a fixed vocabulary V (e.g. 128,256 subwords for Llama-3). Each token is a single int32 — four bytes — regardless of model size. **The work is zero**: pure dispatch into a lookup table.
But the *vocabulary size* you choose here echoes through the rest of the model. Embedding parameters scale as V × hidden_dim, and the LM head re-uses the same V at the end. A larger vocabulary means each token covers more text — fewer tokens per request, shorter sequences — at a constant per-layer cost. Pick V before architecture; everything downstream is sized against it.
Try it in the calculator: change *quantization* and watch how embedding bytes scale, while Token IDs stays at zero.
Try it in the calculator