Notifications crash and plasma apps freeze after some uptime

Hey,

I do have some issues with various parts of my KDE crashing after an uptime of a few hours and since I am pretty new to KDE (switched to it not too long ago), I am a bit stuck currently regarding where to debug further.

How it behaves is pretty strange to me. There is no exact time when it starts happening, but usually after a few hours of uptime, and so far, I was only able to fix it with a reboot each time. It starts with the notifications daemon not working anymore. It stops silently and each time an app tries to trigger a notification, like my music player when the song switches, it gets stuck and I can see timeout issues being logged into /var/log/messages. All KDE-native apps like Dolphin e.g. have issues starting up. They get stuck for sometimes up to 30 seconds. The same is true for the start menu and so on. What I notices as well, is that the start menu seems to forget about all the installed applications. For instance, when I want to open Slack from the menu and the system is in this weird state, the app is not even shown in the menu anymore when I search for it.
Applications with their own rendering technology that are not linked to KDE in particular keep working just fine. I can just not switch between them anymore via the task bar, because it freezes as well. ALT + TAB still works though.
When an application freezes for a couple of seconds before the notifications timeout is being logged, the whole app becomes greyed out and colors return after the timeout. Not sure if this is important.

I checked dmesg, /var/logs/messages, journalctl of the plasmashell. As mentioned, so far I can only see some timeout errors being logged into /var/logs/messages when I get into that situation, but that’s about it.

I am not sure if the notifications daemon is the root of the issues, or if it’s just the first symptom I am noticing. For now, I am basically asking for advice on where I could debug further, as the plasmashell itself does not log any errors.

With the upgrade to Plasma 6.4 I started seeing OOM kills for baloo, because it was using lots of swap even when the main memory had more than enough space left. It used more than 8GB of memory (ram + swap) when it was killed. However, after increasing zram to ram / 4, this issue was gone.

I do have these issues on only one system. They started a few weeks ago, but I am not sure with which version exactly it began. The current setup is:

Fedora 42
KDE Plasma 6.4.1
KDE Frameworks 6.15.0
QT-Version 6.9.1
Kernel 6.15.4-200.fc42.x86_64
Wayland

Hardware:

AMD Ryzen 9950X
64GB DDR5 unbuffered ECC memory
Radeon RX 7800XT

If anyone has some advice where / how to debug further, that would be very helpful.

Thanks!

Given the issues seem to be spread across so many applications I am wondering if these are all consequences of some system issue.

E.g. RAM failure at certain addresses, disk errors resulting in incorrect program/library code being loaded.

Last time I’ve seen this kind of behavior myself I had an SSD close before failure.

That was my idea as well, but the system has not reported any errors in a long time. the last issue rasdaemon reported was 3 months ago and it was just a timeout on a NIC. The SSD reported some bad blocks when I did a manual full SMART check in january, that’s it. No memory errors or anything else.

Also, if this would be the case, the behavior would probably be different and unconsistent. It happens every day, after the machine was running for a few hours. It’s always the same and it’s basically “reproducible” by just waiting.

The most obvious thing is that the notifications window does not appear anymore, and of course these freezes in all apps that somehow are using KDE under the hood (probably). Apps that use Webview freeze and Slack e.g. becomes completely unusuable at that point, while on the other hand my IDE and browser work perfectly fine without any issues at all.

So I am pretty sure it’s no hardware issue, and ras-mc-ctl does not report anything. Just to be sure, I will run memtest86 overnight, and maybe another SMART test, even though the SSDs are all not even 1 year old.

Different apps reacting differently could point to some shared code being affected by the hardware issue.

Say corrupted bits of code/data in Qt, or the memory location into which it is loaded having bitflips, etc.

Have you tried without zram?

Not yet. Fedora enables it by default and I kept it for now. I have enough physical memory that I should never run into issues, so I might even try without swap at all. I ran memtest (incl ECC error injection) last night over multiple hours and it reported not a single error, so that’s definitely fine.

I booted today with even 32GB of zram, because bumping it to 16 solved the baloo OOM kill before. The thing is, that the notifications daemon reported the OOM each time, but now the notifications themselves break at some point. Because it happens for so many apps, my idea was as well, that it might be something with the daemon itself. I am pretty sure that this is the thing that kills my music player for instance. Each time when a new song starts, it triggers a small notification with the song name. It’s working just fine (when the machine is in the broken state), until it plays the next song. I can then see the notification timeout log and after pushing the notification was killed by the timeout, it finally starts to play the next song.

It’s the same for Slack. It of course triggers a notification with a new message each time. And this as well freezes very badly.

I will now trigger a long SMART check and see how it behaves with more zram. If it freezes as well, I will give it a try without swap at all next time.

Is there any way how I can manually restart the internal notification daemon without killing the whole plasmashell, or does it depend on the notifications being available all the time?

As far as I can tell the process which provides the D-Bus interface for notification is plama-shell.

You could check if clearing the notification history helps.

My setup sometimes runs into an issue when notifications will no longer disappear by themselves. Clearing the history helps.
Apparently after a week or so the number of retained notifications can sometimes be too much.

I will check that, thanks. The only app that frequently adds new ones is when playing music, but at least it looks like it always updates an existing notification and does not add a completely new one each time. So there are at most a handful of notifications usually.

The extended SMART test just finished for the whole OS disk without any errors. So this is absolutely fine as well.

The annoying part about this issue is, that testing something new takes a long time. I need to run everything for a couple of hours before it happens again, and I am testing one-by-one of course. Otherwise, if I can fix it at some point, I would not know what solved it.

I am currently testing more zram and if it still exists, I will try with none at all. I will report back with the results. Thank you so far already.

Unfortunately, even with 32GB of zram, I ran into the issue again after an uptime of 4:10 hours. Active memory usage was between 20 and 30G, but caches were using everything left over until 60.4G. Swap was at 10.8 / 30.2G.

It was the exact same behavior as described above. I noticed something new though. Even though the applications window does not appear anymore, if I receive notifications that should only play a sound, these kept working.

Right when I noticed it the first time, I saw those in /var/log/messages:

Jul  7 13:29:13 penguin kwin_wayland[3491]: kwin_core: XCB error: 3 (BadWindow), sequence: 31341, resource id: 29462060, major code: 129 (SHAPE), minor code: 3 (Combine)
Jul  7 13:29:13 penguin kwin_wayland[3491]: kwin_core: XCB error: 3 (BadWindow), sequence: 31350, resource id: 29462060, major code: 129 (SHAPE), minor code: 3 (Combine)
Jul  7 13:29:24 penguin flatpak[5180]: [2:0707/132924.858454:ERROR:libnotify_notification.cc(50)] notify_notification_close: domain=186 code=24 message="Zeitüberschreitung wurde erreicht"
Jul  7 13:29:49 penguin flatpak[5180]: [2:0707/132949.890885:ERROR:libnotify_notification.cc(50)] notify_notification_show: domain=186 code=24 message="Zeitüberschreitung wurde erreicht"

The flatpack app throwing and error there is my music player. Then ~3 minutes later, I saw some errors from the plasmashell:

Jul 07 13:32:28 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.KWin"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:28 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.runners.baloo"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:29 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.KWin"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:29 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.runners.baloo"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:30 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.KWin"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:30 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.runners.baloo"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:30 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.KWin"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:30 penguin plasmashell[3756]: kf.runner: Error requesting matches; calling "org.kde.runners.baloo"  : "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:32:33 penguin plasmashell[3756]: kf.kio.gui: Failed to launch process as service: "app-kwrite@e241d2e576ef4b2ca37be6f57b7bf068.service" "org.freedesktop.DBus.Error.NoReply" "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Jul 07 13:34:29 penguin plasmashell[3756]: QGridLayoutEngine::addItem: Can't add Body_QMLTYPE_684(0x55a5ad059650, id="bodyLabel", parent=0x55a5ad68c1e0, geometry=0,0 47.0156x18) at cell (1, 0) because it's already taken by Summary_QMLTYPE_689(0x55a5abf83260, id="summary", parent=0x55a5ad68c1e0, geometry=0,0 33.4844x21)
Jul 07 13:34:34 penguin plasmashell[3756]: QSocketNotifier: Socket notifiers cannot be enabled or disabled from another thread

The notifications history contained 5 notifications, that’s it. Cleaned them up, but that didn’t do anything. I saw those plasmashell errors only once, but the freezes and all symptoms I already described were unchanged. Shortcurts like for opening the start menu from the taskbar do not work anymore, but I can open it with the mouse. But if I then e.g. search for kwrite, it does not even know the app anymore and only offers me to “execute command kwrite”, which then works of course.

Deactivating swap or having less than 16GB brings new issues. When I swapoff -a, kde-baloo is almost immediately being OOM killed over and over again, when only 6 / 64GB of memory are in use.

Hmm.

You could try suspending or disabling Baloo indexing.

After the kernel OOM killed it, I decided to run /usr/libexec/kf6/baloo_file manually, so it would not have the memory restriction from the systemd service file. It went completely crazy, using 100% CPU (1 core) and up to 15GB of memory. After 30 minutes, it was still going, and I already excluded my development folders with all the source code and generated blobs from complication. The service that went so crazy was /usr/libexec/kf6/baloo_file_extractor.

Adding only basic indexing=true to ~/.config/baloofilerc solved that issue, but baloo was still using the max configured amount of memory immediately after startup. At least that fixed the OOM killing, but I decided to disable it completely for now. I don’t know if this was its own problem, or linked to the other apps freezing issue, but I will keep it like that and see what happens.

When I take a look at /usr/lib/systemd/user/kde-baloo.service with the MemoryHigh=512M, it makes a lot of sense that it’s OOM killed so quickly, when it went up to 15GB after I ran it manually. It probably also makes sense then, that it uses so much swap, even when theres so much free physical memory. It’s not allowed to go over that threshold.

So, I will keep baloo disabled for now, wait another few hours and see what happens. :slight_smile:

After another test, it became pretty clear that baloo does some weird things under the hood. I had it disabled this time and my machines are usually configured with something like vm.swappiness = 5, so that the OS only makes use of swap as a last resort. All of them have enough memory that swap should never be needed.
However, if baloo is enabled, it starts to use the swap pretty much immediately and 8 GB is not even enough. When main memory is at 5 / 64GB with vm.swappiness = 5, and baloo already fills up swap, something is not working as expected. I had it disabled in this test and not even a single byte of swap was used, so it was all baloo beforehand.

The main issue though is, that baloo was a separate issue and it did not solve the freezing. This time it started after 3:38 hours, and the machine was even idle for the first hour.
I am pretty sure that something kde-related is crashing, not only because of the notifications daemon, but also because of the start menu. All the icons for apps are still there, but when I use the search, it forgot about everything, even the apps that exist with their own icon 30px right below the search input.

Any other ideas what else I could check?

I switched to XFCE for the last 2 days and did not have a single issue, so it’s definitely something with KDE, but I have no idea what / where / why.

I have the same problem that the notification part of plasmashell enters a non-functional state, where applications trying to send notifications will freeze until the D-Bus timeout occurs:

flatpak[6276]: [2:0714/111428.747170:ERROR:libnotify_notification.cc(50)] notify_notification_show: domain=186 code=24 message="Zeitüberschreitung wurde erreicht"

Even the KDE system settings application crashes once started:

Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 

Navigating through the D-Bus user-session reveals that the object tree for org.freedesktop.Notifications is empty and connecting to the D-Bus name fails with a timeout:

Anything related to KDE and D-Bus either fails with an error, freezes or outright dies when invoked. Getting Spectacle to run is almost impossible with either no Spectacle invocation, huge delays after it shows itself or a dying System Settings application in the background (which is weird). I can hear the error sound of KDE when the timeout occurs but no window or error is shown or logged.

The only remedy that helps at this point is restarting the Plasma session, restarting the D-Bus user-session (which also kills the current Plasma session) or just rebooting to have a clean slate.

Having the same error as @sebadob might rule out a hardware-related issue.

System Details:

Operating System: Arch Linux 
KDE Plasma Version: 6.4.2
KDE Frameworks Version: 6.15.0
Qt Version: 6.9.1
Kernel Version: 6.15.5-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 PRO 8840U w/ Radeon 780M Graphics
Memory: 32 GiB of RAM (30.0 GiB usable)
Graphics Processor: AMD Radeon 780M Graphics
Manufacturer: LENOVO
Product Name: 21MCCTO1WW
System Version: ThinkPad T14 Gen 5

Best regards!

What you describe is exactly my issue, yes.

It’s no hardware defect for sure. I checked everything in detail, and since I am on XFCE since a few days without a single, tiny issue, makes me very confident that it’s something at least related to what KDE does.

I did not have the issue with 6.2, but I am not sure if it started with 6.3 or 6.4.

I uninstalled plasma-integration-qt5 today. Don’t know why it was there, probably for compatibility reasons for some apps. It uninstalled some I’m not using anyway, like kamoso.

I now made it to an uptime of 6 hours without issues. I am not sure, if that was the “solution”, but it looks promising so far.

Unfortunately, it did not solve it. After 1 more hour this time, I am in “freezeland” again. :frowning:

Very strange.

My system (KDE Neon, Plasma 6.4.2) runs almost 24/7 without issues

Today it even happened after 2:49 hours. I will try a complete re-install of KDE as a whole, and if that does not work, I have no ideas left and probably need to switch desktop environment.

Given that you and @0x7FFF are on different distributions I am wondering it this could be caused by something you both have running but I am not.

Both your logs show an error from a flatpak application failing to send a notification.
Doesn’t mean that it is causing the issue but it could hint at an issue in KDE’s desktop portal implementation.

I usually only have Flatpak Zoom running but that doesn’t create a lot of notifications.