Does Kdenlive need Nvidia CUDA installed for GPU rendering under Linux?

This question was put here already and in my sight, there has not been a sufficient answer. I did some search in the web and I found this: https://forum.endeavouros.com/t/kdenlive-needs-cuda-package-to-use-gpu-acceleration/36769
They are telling, it would be a must to install the complete CUDA package for GPU rendering in Kdenlive.

Before to find an answer, we should state, what kind of installation Kdenlive is: Is it installed as a normal package or as flatpak or as appimage?
As I have experienced some difficulties, I have installed Kdenlive as flatpak on my Manjaro-XFCE-Linux. And I am using a Nvidia Geforce GTX1650 4 GB RAM and an Intel HD 630 Graphics, which is integrated in my i5-7400 CPU. The RAM of my PC is 16 GB.

About one year ago, I experienced a usage of about 15% til 20% of the GPU video power, when rendering with the experimental NVENC H264 encoder, which resulted in an increased rendering speed of up to 40% related to CPU rendering.

Actually, on the same computer system, I am running Kdenlive 23.04.2 as flatpack, no CUDA installed. And now, I can choose NVENC H264 encoder for rendering, but it won’t use the video part of the GPU / graphic card.
As I actually have to work with three 1080p video files, about 16GB each, I have to use proxy rendering. And I want to do this also with GPU video rendering.

My questions:
If I keep this configuration, will I be able to use GPU video rendering after installing CUDA package?

CUDA package is about 5,8 GB because it contains all the developer utilities and not only the CUDA runtime. But there is a ffmpeg-cuda package available. Would this do the trick instead of the big CUDA package?

Or could it be, that I get GPU video rendering only, when I would use the Kdenlive appimage package?

Just an update, as I have to proceed with my work:
My original video clips are 1080p 50fps @ 50Mb/s and PCM audio 2ch 16bit @48kHz.

I have put two of these clips in Kdenlive and immediately creating two proxy files from them. While this process was running, the GPU load at the video part was between 66% and 72%. In the taskmanager, there were two tasks of ffmpeg running each of them causing up to 50% CPU load each.

This result shows, that my GPU rendering is working without having to install the CUDA packet.

But facing this result, I would expect, that rendering my video clips would also be handled by GPU as the creation of the proxy clips. But instead, the load on the GPU video part never exceeded about 6% load and there was one ffmpeg task in the taskmanager which took about 50% load. Rendering was executed with mere framerates at about 35 fps.

I have run tests with several small clips of 1 minute and up to 3 minutes, rendering down from 1080p as described above to 720p 50fps @ 6000 kbps and 560p 50fps @ 3000 kbps and 480p 50fps @ 2000 kbps. There were surprising and nonlogical results in the rendering times:

Clip 1: 720p 4:18 min / 560p 1:54 min / 480p 3:50 min
Clip 2: 720p 8:50 min / 560p 4:18 min / 480p 10:18 min
Clip 3: 720p 4:29 min / 560p 2:02 min / 480p 4:21 min
Clip 4: 720p 4:22 min / 560p 1:53 min / 480p 3:57 min

In all these clips there were only some switching between the two cameras / clips without applying any effects or transitions and no title / text over the clips.

What can I do to improve these results and shorten the rendering time by using GPU with similar intensity as at rendering the proxy files?

I think this has to do with the fact that proxy generation and transcoding tasks are “outsourced” to ffmpeg entirely. IIRC, you can even start working on your project while these tasks run in the background.

Rendering, however, even though it is “outsourced” to MELT is a different story because MELT is handling the effect application and the compositing frame by frame and then sends the frame to ffmpeg for encoding. So even though ffmpeg may use GPU MELT does not (yet?), hence the longer render time