Skip to content

Hardware requirements

Requirements depend on what you use: transcription only, cloud API summary, or offline local summary (Qwen).

Common prerequisites

Resource Required
Python 3.12+ (native install; Docker includes Python)
ffmpeg Required — extracts audio from video
CPU Modern x64; sequential processing (one job at a time)
NVIDIA GPU Optional — accelerates ASR transcription only

Comparison by scenario

Scenario System RAM Free disk Network Models on disk
Transcription only 8 GB min · 16 GB recommended ~6 GB Only at initial setup Parakeet ~2.5 GB
Transcription + API summary Same as above Same as above Yes during summary Parakeet only
Transcription + local summary ≥ 16 GB · 32 GB ideal ~8–10 GB Only at initial setup Parakeet + Qwen ~2 GB

RAM and local summary

Before local summary the pipeline unloads the ASR model from RAM. You need ≥ 16 GB total physical RAM (threshold in MIN_RAM_GB); below this threshold the Qwen engine is disabled.


1. Transcription only

Resource Detail
RAM Minimum 8 GB; 16 GB recommended for long audio (1 h+).
Disk ~2.5 GB ASR + ~3–4 GB dependencies + user output → ≥ 6 GB free.
Network Only for initial model download. Normal use is offline.
GPU Optional. On CPU: ~2× realtime (1 min audio ≈ 2 min processing).

2. Transcription + API summary

Resource Detail
RAM Same as transcription only — LLM not local.
Disk Same as transcription only — no summary model.
Network Required for API calls during summary.
API key UI /settings/summary or data/.secrets/summary_keys.json.

Suitable for PCs with 8 GB RAM if you use cloud only.


3. Transcription + local summary (Qwen)

Resource Detail
RAM ≥ 16 GB (checked at startup). 32 GB recommended (mini PC, long files).
Disk Transcription only + ~2 GB Qwen GGUF.
Network Only for initial Qwen download. Then offline.
CPU Summary is CPU-bound; long texts: several minutes (map-reduce).
Docker Auto-download Qwen on first start if RAM ≥ 16 GB (~10–20 min, one-time).

See also Models and Docker.