Calculator

Pick a model and hardware, then move sliders. Numbers re-compute below.

Model

Hardware

GPUs

Avg input tokens

Avg output tokens

Concurrent users

Max context for memorycaps KV-budget

capacity

—

bandwidth

[0]

Prefill tok/s

[0.0]

Decode/user tok/s

[0]

Aggregate tok/s

time

[0]ms

TTFT

[0.0]ms

ITL

—

End-to-end

economics

[$0.00]

$/M input

[$0.00]

$/M output

[$0.0000]

$/request