Post
Hello,
I’m experiencing a reproducible issue with Speech Recognition / Speech-to-Text in Kdenlive 25.12.1 on Linux, and I’d like to report it with as much technical detail as possible.
Environment
-
Kdenlive version: 25.12.1
-
OS: Linux (Ubuntu-based)
-
UI language: pt-BR
-
Speech engines tested: VOSK and Whisper
-
Language model: pt-BR (installed and detected correctly)
-
Feature status: Speech Recognition configured
Steps to reproduce
-
Open any project with a video clip containing clear, audible speech
-
Ungroup the clip and select only the audio clip
-
Go to Sequence → Subtitles → Speech Recognition
-
Choose either VOSK or Whisper
-
Start processing
Observed behavior
-
Processing starts normally
-
When it finishes, Kdenlive shows the error:
“The selected file /tmp/xxxx.srt is invalid”
- No subtitles are created in the timeline
Critical observation
While monitoring the /tmp directory during processing, I observed that:
-
Kdenlive generates a temporary
.wavfile -
This
.wavfile is:-
Very small
-
Almost empty
-
Contains only low-level noise, similar to an open microphone
-
Does NOT contain the actual audio from the selected clip
-
-
Because of this, no valid
.srtfile is generated (or the file is empty/malformed)
This indicates that the failure happens before the speech recognition engine runs.
Implications
The issue appears to be related to:
-
Audio extraction or rendering for speech recognition
-
Possibly the MLT audio consumer used internally
-
Subtitle system integration introduced in the 25.12.x branch
Both VOSK and Whisper behave the same, which strongly suggests that the engines themselves are not the root cause.
What works
I found two workarounds:
-
Manually create a subtitle track before running speech recognition
-
Sequence → Subtitles → Add Subtitle Track (SRT)
-
After this, speech recognition works correctly
-
-
Enable “Save to file”, choose a path outside
/tmp, then import the.srtmanually
These workarounds avoid the automatic SRT creation/import step.
Additional notes
-
Audio playback in the timeline works correctly
-
Audio meters show correct levels
-
Manually exporting the same audio clip to WAV works correctly
-
Cleaning
/tmpdoes not change the behavior
Questions
-
Is this a known bug in Kdenlive 25.12.x?
-
Is this related to recent changes in the subtitle system or speech recognition pipeline?
-
Is there a recommended workflow to avoid this issue?
I’m happy to provide logs, debug output, or additional tests if needed.
Thanks for your time!
