pick your weapons 🧠

Models & Backends

YapYap separates speech transcription (STT) from text cleanup (LLM). You choose independently for each. Settings live in Settings → Models.

Speech-to-Text Models

These convert your voice to raw text. STT runs first, before the LLM sees anything.

Model

Size

Languages

Backend

Parakeet TDT v3 Default

Fastest. Runs on Neural Engine — no GPU/RAM competition with your LLM.

~600 MB

FluidAudio

Whisper Large v3 Turbo

Best accuracy. Best for 10+ language support. CoreML on GPU.

~1.5 GB

10+

WhisperKit

Whisper Medium

Good balance of speed and accuracy for 8 languages.

~769 MB

WhisperKit

Whisper Small

Minimal footprint. Good choice for 8GB machines.

~244 MB

WhisperKit

Apple Built-in macOS 26+

System model — no download, zero extra RAM.

System

SpeechAnalyzer

When to use what

Parakeet: Your everyday default — fastest, ANE-accelerated, no RAM hit. English + 5 other languages.
Whisper Large: Non-English languages Parakeet doesn't cover (Chinese, Japanese, Korean, Hindi, Arabic, Russian) or when you need the absolute best accuracy.
Whisper Small: 8GB machines where you're short on disk/RAM, or when raw speed matters most.
Apple Built-in: macOS 26+ only. Fastest possible — good as a secondary option when Parakeet isn't available.

LLM Cleanup Models

These take the raw STT transcript and clean it up: remove fillers, fix grammar, format for the active app.

You can also disable LLM cleanup entirely (Settings → Cleanup → Off) and get raw STT output pasted directly.

📏

Prompt complexity scales with model size

YapYap automatically adjusts how much instruction it gives based on model tier — small models get minimal prompts (they hallucinate on complex ones), large models get rich context. You don't tune this manually.

Model

Size

Tier

Languages

Small tier ≤2B params — 8GB friendly

Qwen 2.5 1.5B

Fastest multilingual option.

~1.0 GB

Small

Llama 3.2 1B

Fastest, English only.

~700 MB

Small

English

Gemma 3 1B

Ultra-fast, 140+ languages.

~733 MB

Small

140+

Medium tier 3B-4B params — 16GB sweet spot

Qwen 2.5 3B

Higher quality, multilingual.

~2.0 GB

Medium

Llama 3.2 3B

Great English quality, fast.

~2.0 GB

Medium

English

Gemma 3 4B Recommended

Best instruction-following quality. 140+ languages.

~3.0 GB

Medium

140+

Large tier 7B-8B params — 16GB+ needed

Qwen 2.5 7B

High quality multilingual rewrites.

~4.7 GB

Large

Llama 3.1 8B

Best English rewrite quality.

~4.7 GB

Large

English

Choosing between families

Qwen 2.5: Best default for multilingual use (10 languages). Strong instruction-following at all sizes.

Llama 3.x: Best for English-only workflows. Slightly better English grammar than Qwen at the same parameter count.

Gemma 3: Best for non-English languages — supports 140+ languages at both 1B and 4B. The 4B is the recommended overall default.

Inference Backends

These are the engines that actually run the LLM model. Independent from which model you choose.

⚡

MLX

Default

Apple's own ML framework. Fastest GPU utilization on M-series. Uses safetensors format from HuggingFace's mlx-community.

No extra software required.

🔧

llama.cpp

Embedded C++ engine. Runs GGUF format models — broader model compatibility, slightly lower memory for some models.

No extra software required.

🖥️

Ollama

Delegates to an external Ollama server. Lets you use any model Ollama supports, including ones not in YapYap's registry.

Requires Ollama installed separately.

🖥️ Setting up Ollama

1.
Install Ollama: brew install ollama
2.
Start the server: ollama serve
3.
Pull a model: ollama pull gemma3:4b
4.
In YapYap: Settings → Models → LLM Backend → Ollama, enter model name: gemma3:4b

Default endpoint: http://localhost:11434. Change it in Settings if Ollama runs on a different port or a remote machine.

Recommended by RAM

💻

8 GB RAM (M1 / M2 base)

STT: Parakeet TDT v3 — runs on ANE, no RAM cost

LLM: Qwen 2.5 1.5B or Llama 3.2 1B, or disabled

Backend: MLX

🖥️

16 GB RAM (M1 Pro / M2 Pro)

STT: Parakeet TDT v3

LLM: Gemma 3 4B (recommended default)

Backend: MLX

🚀

32 GB+ RAM (M1 Max / M2 Max / M3 Pro+)

STT: Whisper Large v3 Turbo or Parakeet

LLM: Qwen 2.5 7B or Llama 3.1 8B

Backend: MLX

🎛️

Customize how YapYap writes

Set per-app styles, write custom prompts, and teach YapYap your vocabulary.

Customization →