SmolLM2 · client-side
CPU / WebAssembly / Transformers.js
idle
CPU-only build. Runs entirely in your browser via WebAssembly — no GPU required, no data leaves this tab. Apache 2.0 license, fully clear for commercial use. Expect 3–10 tokens/sec for 360M or 1–3 tokens/sec for 1.7B on a modern desktop CPU.

A chat, running entirely on your CPU.

SmolLM2 by Hugging Face — small, Apache-2.0, purpose-built for on-device use. The 360M variant is snappy enough for interactive chat; the 1.7B is noticeably smarter but slower.

Pick a size above. Click Load model, wait for the cache to fill, then chat. After the first load the model is cached — subsequent visits skip the download.

Runtime
Transformers.js v3 · WebAssembly (CPU)
Weights
HuggingFaceTB/SmolLM2 · q8 quantization
License
Apache 2.0 (commercial use OK)
Context
8K tokens
First load
~30 sec (360M) / ~3 min (1.7B)
After cache
Instant startup, fully offline-capable