Calculator

Pick a model and hardware, then move sliders. Numbers re-compute below.

Model
Hardware
GPUs
Avg input tokens
Avg output tokens
Concurrent users
Max context for memorycaps KV-budget

Memory

capacity
Weights
[0.0GB]
KV/token
[0.0KB]
Concurrent (req · max)
0 · [0]
Context used
Free VRAM
Activation overhead
[0.0GB]

Throughput

bandwidth
[0]
Prefill tok/s
[0.0]
Decode/user tok/s
[0]
Aggregate tok/s
MFU prefill
[0%]
BW efficiency
[0%]
Effective batch
[0]

Latency

time
[0]ms
TTFT
[0.0]ms
ITL
End-to-end
Effective batch
[0]

Cost

economics
[$0.00]
$/M input
[$0.00]
$/M output
[$0.0000]
$/request
GPU $/hr (total)
[$0.00]
Monthly (730 hr)
[$0]
Explain this