pick your weapons 🧠

Models & Backends

YapYap separates speech transcription (STT) from text cleanup (LLM). You choose independently for each. Settings live in Settings → Models.

Speech-to-Text Models

These convert your voice to raw text. STT runs first, before the LLM sees anything.

Model
Size
Languages
Backend
Parakeet TDT v3 Default
Fastest. Runs on Neural Engine — no GPU/RAM competition with your LLM.
~600 MB
6
FluidAudio
Whisper Large v3 Turbo
Best accuracy. Best for 10+ language support. CoreML on GPU.
~1.5 GB
10+
WhisperKit
Whisper Medium
Good balance of speed and accuracy for 8 languages.
~769 MB
8
WhisperKit
Whisper Small
Minimal footprint. Good choice for 8GB machines.
~244 MB
4
WhisperKit
Apple Built-in macOS 26+
System model — no download, zero extra RAM.
System
9
SpeechAnalyzer
When to use what
  • Parakeet: Your everyday default — fastest, ANE-accelerated, no RAM hit. English + 5 other languages.
  • Whisper Large: Non-English languages Parakeet doesn't cover (Chinese, Japanese, Korean, Hindi, Arabic, Russian) or when you need the absolute best accuracy.
  • Whisper Small: 8GB machines where you're short on disk/RAM, or when raw speed matters most.
  • Apple Built-in: macOS 26+ only. Fastest possible — good as a secondary option when Parakeet isn't available.

LLM Cleanup Models

These take the raw STT transcript and clean it up: remove fillers, fix grammar, format for the active app.

You can also disable LLM cleanup entirely (Settings → Cleanup → Off) and get raw STT output pasted directly.

📏
Prompt complexity scales with model size
YapYap automatically adjusts how much instruction it gives based on model tier — small models get minimal prompts (they hallucinate on complex ones), large models get rich context. You don't tune this manually.
Model
Size
Tier
Languages
Small tier ≤2B params — 8GB friendly
Qwen 2.5 1.5B
Fastest multilingual option.
~1.0 GB
Small
10
Llama 3.2 1B
Fastest, English only.
~700 MB
Small
English
Gemma 3 1B
Ultra-fast, 140+ languages.
~733 MB
Small
140+
Medium tier 3B-4B params — 16GB sweet spot
Qwen 2.5 3B
Higher quality, multilingual.
~2.0 GB
Medium
10
Llama 3.2 3B
Great English quality, fast.
~2.0 GB
Medium
English
Gemma 3 4B Recommended
Best instruction-following quality. 140+ languages.
~3.0 GB
Medium
140+
Large tier 7B-8B params — 16GB+ needed
Qwen 2.5 7B
High quality multilingual rewrites.
~4.7 GB
Large
10
Llama 3.1 8B
Best English rewrite quality.
~4.7 GB
Large
English
Choosing between families
Qwen 2.5: Best default for multilingual use (10 languages). Strong instruction-following at all sizes.
Llama 3.x: Best for English-only workflows. Slightly better English grammar than Qwen at the same parameter count.
Gemma 3: Best for non-English languages — supports 140+ languages at both 1B and 4B. The 4B is the recommended overall default.

Inference Backends

These are the engines that actually run the LLM model. Independent from which model you choose.

MLX
Default
Apple's own ML framework. Fastest GPU utilization on M-series. Uses safetensors format from HuggingFace's mlx-community.
No extra software required.
🔧
llama.cpp
Embedded C++ engine. Runs GGUF format models — broader model compatibility, slightly lower memory for some models.
No extra software required.
🖥️
Ollama
Delegates to an external Ollama server. Lets you use any model Ollama supports, including ones not in YapYap's registry.
Requires Ollama installed separately.
🖥️ Setting up Ollama
  1. 1.
    Install Ollama: brew install ollama
  2. 2.
    Start the server: ollama serve
  3. 3.
    Pull a model: ollama pull gemma3:4b
  4. 4.
    In YapYap: Settings → Models → LLM Backend → Ollama, enter model name: gemma3:4b

Default endpoint: http://localhost:11434. Change it in Settings if Ollama runs on a different port or a remote machine.

Recommended by RAM

💻
8 GB RAM (M1 / M2 base)
STT: Parakeet TDT v3 — runs on ANE, no RAM cost
LLM: Qwen 2.5 1.5B or Llama 3.2 1B, or disabled
Backend: MLX
🖥️
16 GB RAM (M1 Pro / M2 Pro)
STT: Parakeet TDT v3
LLM: Gemma 3 4B (recommended default)
Backend: MLX
🚀
32 GB+ RAM (M1 Max / M2 Max / M3 Pro+)
STT: Whisper Large v3 Turbo or Parakeet
LLM: Qwen 2.5 7B or Llama 3.1 8B
Backend: MLX
🎛️
Customize how YapYap writes
Set per-app styles, write custom prompts, and teach YapYap your vocabulary.
Customization →