Gemma 4 · client-side
WebGPU / Transformers.js
idle

A chat, running entirely in your browser.

No server, no API key, no data leaving this tab. The model downloads once, caches in your browser, then runs on your GPU via WebGPU.

Pick a size above. E2B is faster to download and run; E4B is noticeably smarter. Click Load model in the composer, wait for the cache to fill, then chat.

Runtime
Transformers.js · WebGPU
Weights
onnx-community/gemma-4-E2B-it-ONNX · q4f16
First load
~1–3 minutes depending on connection & hardware
After cache
Instant startup, fully offline-capable