File formats¶

Supported input¶

.wav, .mp3, .flac, .m4a, .ogg, .opus, .aac, .wma

.mp4, .mkv, .avi, .mov, .webm, .m4v, .flv, .wmv

Audio is extracted with ffmpeg at 16 kHz mono PCM (required by NeMo).

Standard SubRip:

1
00:00:00,000 --> 00:00:05,120
First transcribed sentence.

2
00:00:05,120 --> 00:00:10,450
Second sentence.

Generated by src/sbobinator/export.py from NeMo timestamps (when available).

Threshold	Behavior
≤ 30 minutes	Full transcription
> 30 minutes	30 s chunks with 2 s overlap, then merge

Configurable in TranscribeConfig (chunk_threshold_sec, etc.).

Without a job folder:

data/output/
├── nomestem.txt
├── nomestem.srt
└── nomestem_riassunto.txt

May overwrite files with the same name. Prefer the default job mode.

All text files: UTF-8.