I’m heavily working with Kdenlive on larger projects with 4K footage and a lot of filters. Now I’m considering to buy a new computer specifically for video editing. According to some tests and videos, AMD Threadripper should be the best choice.
But my personal observation of CPU loads on my current system makes me wondering what really would boost rendering performance.
Currently I working on an Intel i7 Gen13 with 10/12 cores and when it is rendering a video it has a total system load of (only) something between 2 and 3. CPU stats show that the workload is spread out to all cores but each only consuming 20 to 30 percent. The machine has plenty of memory so it is well below swapping and also disk throughput during rendering is negligible.
I would expect all cores to ramp up to almost 100 percent but they don’t. This in turn could suggest that buying an AMD Threadripper could be a waste of money if it is then “idling” on it’s many cores.
This is one of those “how long is a piece of string” questions. The answer(s) totally depend on what you’re optimising for and what your real world bottlenecks are.
Some things are parallelisable and some things are not. Some things that are currently not, eventually will be, and some just by nature never will.
Some things directly effect your user experience (like how long it takes to perform all the import jobs on a bin newly populated with a large number of clips, or to preview render important segments) - and some may not (like how long it takes to render your final output - because unless you have a deadline to deliver it, you don’t need to be present, or can be working on other things while that runs in the background).
Some things are expensive for the benefit they bring, and some are not. Some things are actually making what you need to do take more time, and some just feel that way.
If you want to ‘optimise’ your working experience intelligently, you need to actually profile your use to see where the biggest improvements can be made. If you want to throw money at improving it, you need to figure out if your best value lies in throwing at at better hardware in the places where its speed or bandwidth is the limiting problem, or throwing it at developers to optimise the code bottlenecks to better utilise the hardware you have (or next expect to have).
A year ago I did some testing to try and optimize my system for much the same reasons. Here is a summary of what I found at that time for 4K rendering times using a Ryzen 5700x and 32G Ram:
I found a significant speedup as one increases the number of rendering threads up to 6 threads, almost no improvement above that. Note that the Ryzen 5700x is 8 core/16 thread so getting close to the number of cores may be skewing the results. However, increasing the threads to 7 or 8 did not result in a more loaded CPU so the 6 threads may be more of a limit of how many threads rendering can use rather than CPU overloading.
Using GPU rendering (NVENC from NVIDIA in my case) rather than CPU rendering speeded up things by 30% even with an old/cheap video card (GTX1660)
Replacing the SATA SSD with with a fast NVMe (Kingston KC3000 2TB in my case) produced another 30% speedup. I put my kdenlive project on the NVMe drive while I work on it, then once done editing move it to a large bulk storage HDD for archival purposes.
So my suggestion would be try and spread the upgrade $ between CPU, GPU, and SSD since each one seems to significantly help rendering speeds. I did not test faster RAM, but since that could somewhat increase thread speeds, might be worth going for faster RAM as well.
Yes of course, I’m well aware that there’s no single switch that I could flip to get it faster. But you are right, my experience is a little bit of everything you mentioned
Once you have your new computer,. please let us know what hardware you choose and how happy you are with the results. I’m sure many people doing 4K videos would be interested.
Depends again on what you’re optimising for What can be done on the GPU is typically optimised for speed, so it’s not unusual to get better compression (quality for a given file size) with CPU rendering for many codecs. Speed/Quality/Size mostly is always going to be a pick any two tradeoff.
And at present, many (really most) effects only have CPU implementations, so if you use those, even a modest GPU is likely to be quite underutilised at present. There’s a lot of interest in improving that, but it’s probably still going to be a fairly long process.
Lots of fast RAM is almost always going to help, as is very fast (and even parallel) storage). But after that it’s really how much money do you want to throw at it, and how much benefit do you want to see from that investment how soon.
Given that even fairly modest hardware with enough RAM is currently mostly underutilised, and that if you do have plenty of fast RAM then your current machine is on the less modest side of ‘modest’ - the biggest gains by far will be in the “find someone to better optimise the code” bin rather than the “throw more hardware at the problem” one.
A modern Ryzen, more cores, and a nice GPU will definitely be a nicer machine to use for almost everything - but it will also be even more underutilised than your current one until the software side of things improves.
So this is definitely a complex topic. I think that there are 3 areas which should be considered.
Transcoding of clips (e.g. proxy clips) which is done by ffmpeg.
Working in the GUI which is done by kdenlive, probably with MLT support (don’t no).
Rendering which is done by MLT.
ffmpeg seems to be fully multi-threaded so it utilizes all cores as much as possible. So my conclusion would be that higher number of cores will speed up this process.
About the GUI part I’m not sure. I mean this is how it “feels to work with it”, i.e. does it lag behind or does it instantly do whatever. There is an option for HW acceleration and decoders but I’m not sure what it accelerates (HW decoding means eg h264 decompression, I think).
Rendering is a whole story by itself. Meanwhile I looked more closely at what happens on my CPU. Without parallel processing (which is flagged experimental afaik) MLT opens a bunch of threads but it seems that work is done within a single thread which runs at ~97% while all other threads do something but only at 2-7%. Theses threads peak at times but only for a short periods. Probably this is when clips are mixed together or composited. This behavior would suggest a CPU which has excellent single thread capabilities which Intel CPUs are said to be better at. But if parallel processing is enabled then MLT will use far more threads which of course speeds up rendering and again suggests more cores.
And since this performance discussion is an everlasting always recurring topic I think about creating a “standard test set” specifically for Kdenlive/MLT/ffmpeg which can then be used by everybody around the world to post their results. This would make a comparsion less subjective.
Both more CPU cores and codecs with GPU implementations can help here.
You don’t particularly care about quality for proxies as long as it’s Sufficient, and processing independent clips doesn’t need parallel algorithms to be maximally done in parallel.
Working in the GUI which is done by kdenlive, probably with MLT support (don’t no).
Most of the GUI is inherently single threaded - partly Because Qt, and partly just because it’s that sort of job. But it can and will usefully fork jobs out to the number of cores you have for things like processing bin clips.
Rendering which is done by MLT.
That’s where it gets the most complicated, and where the biggest improvements can be made to utilising hardware better. There are still some things where parallel rendering doesn’t play nice, and even where the GPU is used there are bottlenecks with passing data back and forth from the host. And this effects working in the GUI too, because its rendering to the monitors during interactive editing, not just during the ‘final’ render of a project (which can be done in the background while you do other things).
This would make a comparsion less subjective.
You can create objective measures to profile - and that can be useful - but that’s still not the same as being representative of what would actually improve someone’s daily experience the most, because small differences in details can make large differences in results. There isn’t really a one-size fits all answer to this.
As promised, I’m back again with a 1st feedback about my new custom build on which I started to work with for a few hours. I also rendered my latest project with various options to test the performance and to get an impression of what it is capable of. It is very capable
As I wrote in my 1st post I originally considered to buy a Threadripper but the total costs of such a system definitely would have exceeded my budget.
So I went with the following components: AMD Ryzen 9950X (16C/32T), ASUS B650E-F, 64 GB RAM running @ 5800Mhz, Samsung Pro 9100 PCIe5 SSD. Powercolor Red Devil Radeon 9070 XT.
And the investment was worth it. It is a huge improvement compared to the Mobile i7 (10C/12T) which I worked on before.
I will worked out some performance numbers to give a factual feedback. But I can say that I can now work smoothly in the timeline with the 4k footage without proxy clips. This was absolutely impossible on the old system.