Getting started
Ten minutes to a complete setup. You can skip the AI step and come back to it later — notes, playback, and search work without any API keys.
On first launch, VidNotes triggers a one-time download of two on-device AI models: WhisperKit (transcription, ~770 MB) and MLX (briefs & chat, ~2 GB). These run entirely on your Mac; no network calls after the initial download. Both are stored in Application Support and never re-downloaded.
VidNotes ships with bundled on-device AI that works without any key. If you'd rather use a cloud provider (faster on long content, higher quality on technical briefs), wire one up in Settings → AI Providers.
VidNotes connects to AI providers using your own API keys — nothing goes through our servers. You can skip this step (notes + playback work without any key) or wire up just one to start. Every key goes in VidNotes → Settings → AI Providers, then paste and click Save.
Which should I get? Start with Google Gemini — it’s free, needs no credit card, and handles every AI feature in the app. Add Claude later if you want higher-quality long-form briefs.
- Sign in with any Google account
- Click “Create API Key” → choose (or create) a project
- Copy the key that appears
- Paste into Settings → Google Gemini → Save
- Sign up / sign in
- Settings → API Keys → “Create Key”
- Name it (e.g. “VidNotes”), copy the key
- Add payment + $5 credits under Billing
- Paste into Settings → Anthropic → Save
- Sign up / sign in
- API Keys → “Create new secret key” → copy
- Billing → add payment method + $5+ credits
- Paste into Settings → OpenAI → Save
- Go to the link — it opens the YouTube Data API v3 page
- Click “Enable” on the API (creates a project if needed)
- Sidebar → Credentials → “Create Credentials” → “API key”
- Paste into Settings → YouTube Data API → Save
Drag any video file into the window or paste a YouTube URL. Press ⌘↩ to save a timestamped note. Press C to toggle captions if you’ve imported a transcript.
Click the Brief tab on the right, then Generate Brief. VidNotes transcribes the video (if not already done), extracts claims and entities in parallel segments, and produces a 4-layer research brief with clickable timestamps.