I should add that I tried to revert back to 515, but Neon refused and simply reinstalled 525. I then purged 525 again and installed the newer 535, but the problem persists. I am not sure if I can force it back to 515 if I revert the kernel back to 5.19. I am hesitant to try as this is my daily driver workstation. I do not want to brick it.
It’s hard for me to test this because I don’t have a second Nvidia GPU, but I’ll see if I can narrow down the issue. If you’re interested, you can find the relevant code here:
Since the static information is properly set, the issue is definitely in the code that runs nvidia-smi dmon -s pucm
, which retrieves GPU information regularly. It seems like it either doesn’t work for the second GPU index, or that the data received is not properly processed in the second GPU instance. I’ll keep looking into this.
For the sake of good information keeping, could you file a bug report on bugs.kde.org? It would be filed against ksystemstats, and it would be good to include all the information provided here. I’d follow up on it there, and should I not solve the issue it would at least be known to others.
It would be my pleasure. I will head over to bugs now. I grabbed the entire source for ksystemstats. It might be worthwhile tracking down the bug for the original white space problem, determining what was done to fix it, and see if somehow its the problem again.
I just ran another “out.txt” with in Blender with a full 100% usage CUDA load on a project with known values, and everything looks completely normal. Power, temps, clocks, and usage on both gpus are consistent with previous values.
If I were to suggest to you to change parts of the code, would you be able to compile ksystemstats
and test?
Remember, on your own system you would need to check out 5.27 branch.
I’d also like to clarify some stuff:
GPU 1 reports wrong values, and GPU 2 reports no values.
It looks to me like the GPU 1 information is correct, is it not? And, did the second GPU data work (specifically, temperature and other dynamic information) before?
For sure. I would be happy to. It would simply be a matter of ensuring I have all the needed dev packages and depends (like the above mentioned ECM). It been many years since I did any dev work, so I may need a bit of hand holding.
No, GPU 1 is about showing about 20 degrees C too cool, the GPU clocks are reported about 4 times lower than actual, and CPU usage only ever gets to about 40% unless both cards are running at 100%, at which point it shows nearly the right value for GPU 1.
Honestly it seems like its dividing GPU 1 by GPU2 if GPU 2 is idle if that makes any sense.
Perhaps it might be best to boot your system into KDE Neon Developer Edition, so you have everything there. The latest version is out-of-date, which is good in this case because it’s all Qt5/KF5 out of the box.
If you boot into the developer edition, and build ksystemstats
on the 5.27 branch, it should work. The lines I’d like to try to remove are:
bool ok;
int index = parts[0].toInt(&ok);
if (!ok) {
return;
}
This is probably too much to ask, to be honest, and so I’ll see if removing those lines breaks the single-GPU case and get them into the the next 5.27 release if not.
Are you sure the reported GPU information (via System Monitor/applet/my plasmoid) is incorrect with the output of nvidia-smi dmon -s pucm
? If that output is wrong, then the issue isn’t ours.
It is not too much to ask at all. I have been playing with compiling KDE in Gentoo (only took 2 hours). I would love to solve this problem and learn more about how this works. I am downloading the developer ISO. I will install it on an external HDD. It might take a bit of time to get sorted, so as long as you are ok with that, so am I.
My apologies, I wasn’t very clear. Your plasmoid is reporting the correct temperature for GPU 1. It reports 0 for GPU 2. Other desktop widgets/applets/plasmoids, including any GPU enabled sensors on System Monitor Pages all report wrong and missing values. Yours, and Psensor are the only ones that show the correct temp for GPU 1.
That is really strange. Could you show me a screenshot of System Monitor/System Monitor sensor disagreeing with my applet, for the same sensor?
This shouldn’t be possible.
OK, this is strange… I disabled GPU2 for CUDA rendering, and ran a render.
GPU1s data did not change in System Monitor… at all.
I then disabled GPU 1 and re-enabled GPU 2, and the sensor came alive, and realized that all of GPU1s sensor output is actually GPU2s, and correct.
Now I understand what is happening. When I am only using GPU 1, I am seeing the idle data of GPU2. Temp, freq, pwr…
Here is a screen of the temp diff.
I assume disabling/enabling GPUs changes the index in nvidia-smi
, resulting in that breaking. The good news is my potential fix doesn’t break things, so I’ll make an MR and we can see if this now works in 5.27.8.
I only changed it in Blender to try to understand what what happening. I did not disable it in the system.
So, basically, System Monitor GPU sensors thinks GPU2 is GPU1 and reports GPU2s sensor output as GPU1s. It is not showing any data from GPU1.
Which begs the question… why then is your applet showing the correct temps for GPU1, and 0 for GPU 2?
Thank you, by the way. I sincerely appreciate your help with this.
It uses hardcoded sensors as added by the user, e.g. gpu/gpu0/temperature
. If the sensor is not available, it simply shows “-”. System Monitor might have more complicated behaviour.
As mine seems to be a somewhat unique case scenario, it might be wise to get Neon Development (I think the external SSD has Kubuntu 23.04 on it, so it will have to go bye bye) up and working so I can at least help test changes. Is there a group or channel I should look to for advice on getting it all set up correctly, that is patient with hamfisted noobs?
#kde-welcome:kde.org
:
My merge request is here:
Well, I compiled ksystemstats (which also compiled 37 dependencies) using kdesrc-build. I commented out the lines from the source as directed, but kdesrc-build claimed there were no changes, but was compiling anyway.
How would I check to see if it works, and would I need to install all the compiled deps?
I wonder if I should do a full plasma compile and run a complete session as suggest in the docs.
Running a complete session is the best way to test this, for some reason ksystemstats
wouldn’t work for me otherwise.
but kdesrc-build claimed there were no changes, but was compiling anyway
Strange, did it remove them? It may do so and you might need to do --no-src
.
For a full plasma session, do
kdesrc-build plasma-workspace plasma-framework plasma-integration bluedevil powerdevil plasma-nm plasma-pa plasma-thunderbolt plasma-vault plasma-firewall plasma-workspace-wallpapers kdeplasma-addons krunner milou kwin kscreen sddm-kcm plymouth-kcm breeze discover print-manager plasma-sdk kdeconnect-kde plasma-browser-integration xdg-desktop-portal-kde kde-gtk-config kgamma5 breeze-gtk drkonqi phonon flatpak-kcm kactivitymanagerd --include-dependencies
and then:
kdesrc-build plasma-desktop systemsettings plasma-disks plasma-systemmonitor ksystemstats kinfocenter kmenuedit --include-dependencies
and then:
~/kde/build/plasma-workspace/login-sessions/install-sessions.sh
Make sure to build kystemstats
with the changes after all that:
kdesrc-build --no-src --no-include-dependencies ksystemstats
This is the output:
Building ksystemstats from kf5-workspace-modules (38/38)
Fetching remote changes to ksystemstats
Merging ksystemstats changes from branch Plasma/5.27
* You had local changes to ksystemstats, which have been re-applied.
No changes to ksystemstats source code, but proceeding to build anyway.
Compiling… succeeded (after 1 second)
Installing… succeeded (after 0 seconds)
<<< PACKAGES SUCCESSFULLY BUILT >>>
I will compile the whole thing, then test it, then try to recompile ksystem stats as suggested. Thanks!