How to improve speech recognition

I discovered kdenlive and love it. I use it mostly for creating subtitles through its speech recognition. Although very good, I have some issues I thought someone more experienced with this software could resolve.

I have the latest kdenlive update and have tried VOX and whisper. I have tried every model on the same 5 minute video cut as comparison. The Medium English Only 1.5GB seems to be the best for me. However I don’t understand why so many of the created srt’s have multiple lines at the beginning with only the word “You”.
It makes no sense. There is no speech going on in the background of the video or even music. Where is it getting this “You” from?

Other times I see multiple lines which are exactly the same sentence. I cleared the cahce on my linux mint and in kdenlive and I think that helped a bit.

Can anyone give me advise to help make the SRT’s better by solving these problems?

Much appreciated thanks