welcome to the docs ✏️
YapYap Documentation
Everything you need to get up and running, pick the right models for your machine, and tune the AI to write exactly the way you want.
🚀
Getting Started
Install, grant permissions, download your first model, and transcribe in under 5 minutes.
🧠
Models & Backends
Pick the right speech and language model for your Mac's RAM, language, and quality needs.
🎛️
Customization
Custom prompts, per-app styles, personal dictionary, and VAD tuning.
❓
FAQ & Troubleshooting
Answers to the most common questions, and fixes for the most common problems.
How it works
Every recording goes through this pipeline — in under 3 seconds on a good machine.
1
Audio Capture
AVAudioEngine captures 16kHz mono audio from your microphone while you hold the hotkey.
2
VAD Filtering
Silero VAD strips silence and background noise before it reaches the STT model — preventing Whisper hallucinations.
3
Speech-to-Text
Parakeet TDT v3 (Neural Engine) or Whisper (CoreML GPU) converts audio to raw text on-device.
4
LLM Cleanup
A local LLM (Qwen / Llama / Gemma via MLX, llama.cpp, or Ollama) removes fillers, fixes grammar, and formats for your active app.
5
Paste
Clean text is injected into your active app via clipboard + synthetic Cmd+V — no typing required.
Architecture
YapYap is a native Swift + SwiftUI app. No Electron, no web views, no cloud.
STT Layer
- WhisperKit (CoreML)
- FluidAudio / Parakeet (ANE)
- whisper.cpp (GGML)
- Apple SpeechAnalyzer (macOS 26+)
LLM Layer
- MLX Swift (safetensors)
- llama.cpp (GGUF)
- Ollama (HTTP API)
Context Layer
- NSWorkspace app detection
- AX API window/field reading
- 11 app categories
- Per-category prompt rules
Data & UI
SwiftData (SQLite)
AVAudioEngine
Silero VAD
Sparkle auto-update
KeyboardShortcuts
SwiftUI + AppKit hybrid
All models are stored in ~/Library/Application Support/YapYap/models/ — never in ~/Documents (iCloud eviction hazard).