Technical Offline Pack guide

Local AI text-to-speech for browser-based reading

Local TTS is a runtime problem as much as a voice problem. The useful question is not just which model sounds best, but whether it loads quickly, runs on real devices, handles long text, and falls back cleanly to cloud voices.

Why this matters

People searching local AI text-to-speech often want implementation detail: which engines can run locally, what hardware is needed, and how local voices compare to cloud TTS.

WebGPU, WebAssembly, and ONNX-style runtimes are making browser AI experiments more practical.

Local TTS can reduce cloud cost for repeated long listening, but model download size and latency can hurt activation.

Developers and power users need a clear comparison between browser Speech API, local neural voices, and managed cloud voices.

Honest status

Sornic production audio uses cloud TTS today. Local AI TTS is being evaluated as a future optional mode, not as a fully available feature.

What works today

  1. 1

    Paste text or Markdown into Sornic and generate cloud audio immediately.

  2. 2

    Use cloud voices when quality, speed, multilingual support, or MP3 download matters.

  3. 3

    Join the waitlist if you want to test local engines once Sornic can detect device capability.

What offline mode would add

  1. 1

    Detect browser features such as WebGPU and practical memory availability.

  2. 2

    Download a compatible local TTS model such as Kokoro or a fallback such as Piper.

  3. 3

    Chunk long text, synthesize locally, and fall back to cloud when the device is too slow.

What this guide covers

Browser Speech API vs neural local TTS

The browser Speech API is built in but inconsistent across operating systems and voices. Neural local TTS can be more controlled, but requires model loading, runtime support, and careful text chunking.

Runtime expectations

A browser implementation may depend on WebGPU for speed, WebAssembly for broader fallback, and ONNX-style runtimes or custom inference code. The runtime decision can matter more than the model name.

Latency and model size

The first run may need a visible model download. Long documents must be chunked so users hear audio quickly instead of waiting for the entire document to synthesize.

Model and product notes

Kokoro for quality experiments

Kokoro is attractive when voice quality matters and the model can fit a practical browser workflow.

Piper for compatibility fallback

Piper-style voices may be useful where fast, predictable local synthesis matters more than premium voice quality.

Cloud TTS for production reliability

Cloud voices remain easier for high quality, multilingual support, and predictable performance across devices.

Browser/local TTS options

CategoryCloud reader todayOffline reader direction
Browser Speech APIInstant but inconsistent voicesNo model download, limited control
Kokoro-style local TTSNot live in Sornic yetBetter quality target, needs runtime testing
Piper-style local TTSNot live in Sornic yetCompatibility fallback, voice quality varies
Cloud TTSCurrent production pathBest quality and consistency, uses quota
Long documentsCloud handles predictable synthesisNeeds chunking, caching, and progress UI
COMING SOON

Join the Local AI TTS waitlist

Get notified when Sornic starts testing browser-based local TTS engines as part of Offline Pack.

FAQ

What is local AI text-to-speech?

It means the voice model runs on your device instead of sending text to a cloud TTS provider.

Does local TTS need WebGPU?

Not always, but WebGPU can make browser AI faster. WebAssembly may provide broader fallback with lower performance.

How is this different from the browser Speech API?

The browser Speech API uses whatever voices the operating system or browser exposes. Local neural TTS would give Sornic more control over voice quality and behavior, but with more setup cost.

Will local TTS be faster than cloud TTS?

Not necessarily. Fast desktops may do well after model download, while older devices may be slower than cloud generation.

Why would Sornic still keep cloud TTS?

Cloud TTS is currently more reliable for high-quality voices, multilingual coverage, MP3 downloads, and low-friction first use.