File formats¶
Supported input¶
Audio¶
.wav, .mp3, .flac, .m4a, .ogg, .opus, .aac, .wma
Video¶
.mp4, .mkv, .avi, .mov, .webm, .m4v, .flv, .wmv
Audio is extracted with ffmpeg at 16 kHz mono PCM (required by NeMo).
Output per job¶
| File | Format | Content |
|---|---|---|
trascrizione.txt |
Plain UTF-8 text | Full transcript |
sottotitoli.srt |
SubRip | Segments with timestamps |
riassunto.txt |
Plain UTF-8 text | Summary (if generated) |
source.* |
Original | Copy of the uploaded file |
job.json |
JSON | Job metadata |
SRT format¶
Standard SubRip:
1
00:00:00,000 --> 00:00:05,120
First transcribed sentence.
2
00:00:05,120 --> 00:00:10,450
Second sentence.
Generated by src/sbobinator/export.py from NeMo timestamps (when available).
Long files¶
| Threshold | Behavior |
|---|---|
| ≤ 30 minutes | Full transcription |
| > 30 minutes | 30 s chunks with 2 s overlap, then merge |
Configurable in TranscribeConfig (chunk_threshold_sec, etc.).
Legacy output (--legacy-output)¶
Without a job folder:
May overwrite files with the same name. Prefer the default job mode.
Encoding¶
All text files: UTF-8.