Skip to content

File formats

Supported input

Audio

.wav, .mp3, .flac, .m4a, .ogg, .opus, .aac, .wma

Video

.mp4, .mkv, .avi, .mov, .webm, .m4v, .flv, .wmv

Audio is extracted with ffmpeg at 16 kHz mono PCM (required by NeMo).


Output per job

File Format Content
trascrizione.txt Plain UTF-8 text Full transcript
sottotitoli.srt SubRip Segments with timestamps
riassunto.txt Plain UTF-8 text Summary (if generated)
source.* Original Copy of the uploaded file
job.json JSON Job metadata

SRT format

Standard SubRip:

1
00:00:00,000 --> 00:00:05,120
First transcribed sentence.

2
00:00:05,120 --> 00:00:10,450
Second sentence.

Generated by src/sbobinator/export.py from NeMo timestamps (when available).


Long files

Threshold Behavior
≤ 30 minutes Full transcription
> 30 minutes 30 s chunks with 2 s overlap, then merge

Configurable in TranscribeConfig (chunk_threshold_sec, etc.).


Legacy output (--legacy-output)

Without a job folder:

data/output/
├── nomestem.txt
├── nomestem.srt
└── nomestem_riassunto.txt

May overwrite files with the same name. Prefer the default job mode.


Encoding

All text files: UTF-8.