Baloo Safeguards and Errors

Baloo over the years have given me many headaches and it is quite a difficult beast to debug.

I now finally managed to find one of the major issues that have been breaking baloo for me.

The thing is: like it or not baloo has a bad reputation and it is commonly reccomended to be always turned off or always set to only index file names. And I think part pf the blame comes down to it being a background utility that you only ever notice when it runs into issues, but a second thing is: it does not really realize fail states and has for instance generated 42GB indexes of a 300GB folder on my desktop pc. And I only realized that because it did the same thing on my laptop and caused significant battery drainage.

So a few suggestions to help get baloo a better reputation.

Disabling it on battery

It has dropped my bettery life from 6hrs to 1,5 hrs. Please just … not let that happen.

Add warnings for weird behavior

Baloo ran into Memory Limit (for an hour)

Directory (XYZ) takes unusually long to index/has unusually large index space, do you want to exclude it from the index?

Baloo has been stuck for 2 days while indexing. You may want to look at this troubleshooting guide.

I think throwing some desktop notifications when weird things happen or at least displaying it on the settings page would be a good idea, so it does not just keep quietly failing and eating up your disk, cpu and battery.

Add safeguards against weird behavior

I have a lot of trouble coming up with good metrics here. The plaintext search for instance is a double edged sword. On the one hand it is great for e-books. On the other hand it tried to index my .obj files and caused a huge mess.
I know it ignores some file types already and this is a good thing.

I just feel like there may be a smart solution to define some unwanted behaviors and stop baloo from taking too much time and space to index.

Increase the memory limit

I think it is safe to give it a bit more RAM if that means indexing can run smoother.

1 Like

Some safeguards like the ones you mention seem like a good idea, but I think in general the problem is very hard to solve in a way that works for everyone.

I think it’s best for indexing things like letters that you wrote or received, and similar documents. So that if you want to find your correspondence with your bank, you can just search for “bank” or its name.

Ebook search (which you point out in particular) is something that’s great for some kinds of uses, but not for others. Someone who bought a lot of humble bundles, or got a lot of Public Domain/CC/free books, can quickly get to the point where searching through books is pointless as you’ll drown in false positives for anything you might want to search, and it’ll poison the search through your personal files (as the bank-related search result get swamped by memory banks, river banks, etc.). Anyone who has a lot of data won’t get around having to do some (or a lot of) tweaking of the inclusion/exclusion criteria.

1 Like

IMO making the index opaque, and maintained only by the baloo indexer, is what makes this case a problem. If the index format was open, one could install different indexers or one’s own scripts to clear out the false positives.

(Of course, this would facilitate diagnosis of baloo’s failures, and detailed bug reports…)

It suspends indexing for me on battery.

This always a good idea for this sort of user-facing tool.

I’m not quite sure how this would help.

Excluding things from the index is already quite possible, just exclude that directory. The problem to me is that it’s something the user has to do, and subsequently continue to manage. It’s not something that can be solved automatically because user requirements are so vastly different.

sorry, forgot to mention this here. This one was a user configuration/distro confusion error.

I think it is very important to not forget the noob users (like me) when adding things like this. I do not want to get into the command line when making my desktop behave as expected and I do not want to script.

The average user will not realise baloo us causing power drain and indexing itself into oblivion: It took a misconfiguration in nixos causing power drain for me to realize something being off with baloo at all.

There should be at least some warnings about baloo behaving weirdly to prompt the users to make some adjustments imho. I see how a one-size-fits-all solution is likely not feasible.

1 Like