What is the bottleneck of Spacer Tool (M) operations in kdenlive?
When I move clips, the CPU usage seems to be low & there seems to be lots of free RAM available too, yet it takes me over 10 seconds for Move operations to complete in my current project (Intel i5-10500T, 64GB ram, MX Linux, 256GB SSD).
I’ve always been curious what the bottleneck is when doing Spacer Tool operations. Anyone know?
Mostly it’s that once you’ve ‘grabbed’ everything to the right of the pointer, for every mouse change event the UI sees, that has to propagate into updating the state data for every grabbed item. And some of those things may (either in Qt itself, or our code) also be Accidentally Quadratic in their runtime complexity.
There is already optimisation in the code to try to minimise that by ‘faking’ what is really grabbed until the move is complete, and you might be able to improve the performance in practice by zooming/scrolling the timeline so that most of what will be moved is outside the visible space.
That said though, I’ve never noticed it being quite that bad, at least not in any project that wasn’t already so large that many UI operations were suffering a notable and drastic slowdown - so I assume this is indeed a ‘very large’ project (even if it isn’t starving the rest of your system)?
I believe there were some more optimisations that should help large projects committed recently. What kdenlive version are you using, and how big is your actual project? Is there anything else ‘special’ about what you’re doing that seems to contribute to the slowdown (in my cases, very large numbers of keyframes seem to be a significant factor).
Further improvements probably need a sharable reproducer and some profiling runs to flag the hot sections where gains are most likely to be found.
I might be a couple of kdenlive versions behind; I have to check when I get home. I just saw the new 25 release moments ago & updated this laptop but I need to do the same on the Mini PC that this project was transferred to. On this particular laptop the waits were around double the time that I get on the Mini PC at home (laptop specs: Intel Core i5-7300U, 32GB, 1TB SSD, Intel HD Graphics 620), so I’m not complaining at all, but I am curious about bottlenecks.
The project is 720p (YouTube clips), and I was just about to say I don’t think there’s a crazy amount of stuff going on, but this project was started last year and went through some evolution; I now see that there are actually a lot of clips scattered all over the timeline (as he looks at an older version of the project still on the laptop). It would probably be an idea to start trimming those.
The frame size probably isn’t significant, it’s not re-rendering as this happens so the number of pixels shouldn’t come into play - it’s likely more related to the amount of timeline metadata that needs changing when the position of the clips change.
If you can pin it down to one particular clip, or group of clips, that if you remove makes a notable difference, that would definitely be worth reporting in detail, because what’s ‘special’ about them could be an easy hint for code that needs attention.
Stuff like this is worth noting, because even if it doesn’t get fixed immediately, or isn’t easy to pin down, knowing that it happens can help to recognise problems even when you aren’t specifically working on investigating them.
I haven’t done any deleting of clips yet in the 25 release, but I did start using it. I’ll keep this in mind. When the Spacer Tool is utilised to move a bunch of clips and I see low CPU usage, I’m guessing that it must be just the one core or two that is maxed, resulting in the low overall average CPU usage (because the other cores are not used). Would that be the correct? I suppose I should get a utility showing utilisation per-core.
Yeah, on the whole, kdenlive isn’t very heavily multi-threaded, and a lot of the things that it does during ‘normal editing’ operations can’t very usefully be parallelised anyway. It will spin off things like bin jobs (creating thumbnails and proxies etc.) and preview rendering, and some parts of final rendering, but munging the data for a clip group move is somewhat inherently a single threaded operation without a very different data model.
My first bet would be that most of that time is spent crawling through data structures just to find the bits that need to be changed - that’s where the runtime multipliers usually are. Finding and where possible fixing or avoiding those inefficiencies is where the biggest gains to this are likely to be made, unless there really is just one very big oops hiding somewhere in that chain. But profiling a problem case is really the only efficient way to know what it might usefully make a difference to optimise.
Plain vanilla top should be able to do that if you press the right button, though it depends a bit on which implementation/version your distro ships with. With the one I have today, ‘1’ will toggle stats for individual CPU threads.
That still might not give you a very clear picture though, because I don’t think kdenlive sets CPU affinity for its threads (and it probably shouldn’t), so the logical thread that is busy doing all this work still might be bouncing between multiple “CPUs” each time the scheduler gives it a time-slice to do some work, and between different wait states as some blocking instruction occurs (like waiting to page in or access memory etc.). This isn’t a pure computation running in the CPU, it’s a bulk update of lots of memory which itself is very slow relative to CPU instruction runtime.