CPU-only build. Runs entirely in your browser via WebAssembly — no GPU required, no data leaves this tab. Apache 2.0 license, fully clear for commercial use. Expect 3–10 tokens/sec for 360M or 1–3 tokens/sec for 1.7B on a modern desktop CPU.
A chat, running entirely on your CPU.
SmolLM2 by Hugging Face — small, Apache-2.0, purpose-built for on-device use. The 360M variant is snappy enough for interactive chat; the 1.7B is noticeably smarter but slower.
Pick a size above. Click Load model, wait for the cache to fill, then chat. After the first load the model is cached — subsequent visits skip the download.