Does Baloo have a future?

Hello, my dear young friends!

If I see it correctly, Baloo is an integral part of KDE/Plasma. In Krunner there is a search module for desktop search, via Dolphin it should be possible to find files and content. It is currently not particularly convenient to use, but could be a useful start to replace established desktop searches like Recoll in the long run.

Now there are already reported issues that are not corrected, at least not from the user’s point of view. As a striking example, I cite the duplication of search results, which should be factually impossible because in the duplicate the “found” directories do not exist. The elapsed time since the bug reports naturally raises the question if the development of this framework will be continued or if something else is planned.

How do you see it, how does it go on with Baloo, is it worth to wait?

(I had already asked this question in the old forum, under “KDE Frameworks & Development”, "please answer it here.)

2 Likes

Personally, I have no need for a file indexer, so I always disable Baloo on every system I own. But that is just my preference. When I need to find a file, I use fd.

I have not had any issues with Baloo, especially since I added my git repositories folder to ignored folders.

I wouldn’t even know such thing exists if I had not heard of it :sweat_smile: For me it has worked fine.

4 Likes

baloo-kf5: baloo displays search results multiple times.:
#+begin_quote
Dear Maintainer,

The problem should be well known, e.g. it has been reported here over the years:

It has nothing to do with the underlying file system, here XFS.

It doesn’t matter how often baloo’s database is reset and
re-indexed, after a short time the duplicates will multiply
miraculously.
#+end_quote

2 Likes

Yep, for me with these settings, it works perfectly:

Anyone who has specialized use cases involving huge files, data sets, or directory trees that don’t make sense to index, can make things a lot better by doing the same.

Most regular people don’t have use cases like this, which why for them Baloo generally works without drama out of the box.

Baloo already tries its best to avoid indexing git repos and Virtualbox images and similar things, but the most reliable thing is to specifically it to ignore things you don’t need indexed.

4 Likes

So I have no special requests, ngraham, it’s just two clean directories that are read in, as the output of “balooctl config list includeFolders” shows. No $HOME and no hidden files and directories occupied by programs, no system files, but relatively many text files, some audio books in MP3 and Vorbis and some music in Flac. The “contentIndexing” is set to “no” and “excludeFilters” by default also includes the content you mentioned as problematic. The index has a size of 286 MiB. With this setting, Baloo should at least come close to the performance of locate(1).

I leave the proper content search to Recoll. Recoll’s database is currently 35918 MiB in size and collects all content, including especially special use cases with huge files, data sets and all kinds of directory trees - whatever you want to imagine by that. And of course, in this scenario, the data is scattered across the internal network. Because that’s what it’s all about for me, I need such stuff after more than a quarter of a century with the PC, dear Kresimir, because it shouldn’t matter where the data is stored as long as I can find it again and keep my head clear. And if I am looking for a pin in a pile of pins, I will find it with Recoll. (By the way, also from the console, by highlighting words or sentences in any program thanks to clipboard actions, but also via helm-recoll directly in the editor).

However, the KDE community has decided in favor of Baloo, which is quite ok, but his fellow users are obviously practicing appeasement since then. As if users are too stupid to use an application that was explicitly created for end users. If you really see it that way then write it in big letters on the application and take away the users configuration options, done. Instead, you might think the configuration is there to chalk up possible errors to the user, as a backdoor, so to speak. But I don’t want to believe that for now. Another part of the rather passive community complains that such an indexer makes itself felt in the consumption of CPU and memory and is desperately looking for the off switch (see social media). That’s how different people are, isn’t that funny? And with all this, it doesn’t go any further. Desktop developers are left sitting on their software, as Esk-Riddell wrote today (“Getting KDE Apps to our Users”), for example, and age-old stories like Emacs and Vim are blossoming into new life.

I hope you also had a little fun reading and wish you a pleasant evening, or morning, depending on where you pitched your tent. :wink:

I get you are frustrated, but some bugs are harder to fix than others, some features take longer to implement than others.

Dolphin lets you already choose different search tools. Not sure about Recoll, since I have never used it and Baloo is currently sufficient for me. Would a better Recoll integration be enough for you? Or is it about dropping Baloo as a default?

Making such big changes as dropping support for Baloo and using something different, even if decided, would take time anyway (I think). Lashing out at people for having a different experience with Baloo, or trying to help, will not help you changing anyone’s mind.

Since you already an alternative to Baloo and are happy with it, tell the Developers instead which integration into KDE is missing.

I have no internal knowledge of the decision making of the KDE Project but I’m sure just because two Developers have different experience with Baloo it’s not like your voice isn’t heard or considered.

1 Like

The elapsed time since the bug reports naturally raises the question if the development of this framework will be continued or if something else is planned. How do you see it, how does it go on with Baloo, is it worth to wait?

You mean worth waiting for something (one of the following)?

  • a developer appears to maintain baloo
  • someone develops a replacement
  • KDE decides to use existing software as a replacement
  • KDE stops shipping a file indexer

The first I think is quite a valid possibility. The only counterpoint I can see is:

  • people on the internet have been badmouthing the project, sometimes even without reason, and recommending to disable it instead of helping report and fix bugs for years already, so it’s seen as a thankless job and as a result less people want to work on it, including the previous maintainer

The second I think is not going to arrive anytime soon. I have a few reasons to think so:

  • nobody came with a replacement in almost a decade
  • baloo is actually really fast and efficient for what it is
    • in over four years of r/kde I’ve never seen anyone complain about its search speed: it’s pretty much instantaneous
    • it is capped at 25% CPU usage for usually a minute during basic file indexing
    • it takes more time, but uses way less CPU during content indexing, oscillating between 1%, 3%, 5%, and having occasional peaks of 18% in my tests
    • last year it was capped even further to behave without substantial changes to the indexing speed
  • the thing is first and foremost a library, and is centralized, meaning any app using its API can have access to the same file data; that is to say, a replacement needs to provide the same for a whole desktop environment and apps suite, which is a lot of work

The third just seems unlikely. The only candidate that I find realistic as a replacement for the final product is GNOME Tracker, but it depends on GIO and indexes so much less. Xapian doesn’t seem like a good library replacement given this page exists. Recoll finishes content indexing before baloo, but that’s because it uses double the CPU.

The fourth doesn’t make sense to me unless it’s a temporary measure. Every other system out there has a file indexer.

7 Likes

Used Baloo for maybe three days and never enabled it again. For the few times I need a file search, Recoll did have one advantage, namely finding words in pdf. Something Fsearch doesn’t. At least, not to my knowledge. So, in my case, if I’d really need some sort of gui frontend it’d be Fsearch for its speed and low foot print. I’ll do my pdf searching with pdfgrep. But most certainly not Baloo.

Hey @Herzenschein,

I basically agree with everything you say.

That being said, isn’t there a fifth option? Let people choose which file indexer they use, let Distros decide which to ship as a default.

I’m not a Developer or have any knowledge of file indexers therefore I don’t know much work it would be for KDE to support multiple file indexers, is that no option?

DITTO Baloo is completely worthless in my book.

My first thought would be that for that to happen, instead of using baloo’s API directly, we’d need to use some other library API that abstracts calls to those other indexers. So…

Elisa -------\                 / baloo
Kickoff ------> IndexerCaller > recoll
KRunner -----/                 \ tracker

That sounds like it would just make things way less maintainable, since it would mean KDE would have to explicitly support not one, but multiple indexers, or at the very least both baloo and this additional library.

Either that, or some kind of standardized protocol agreed to by all three indexers regarding how to interact with the system.

In both cases, not really feasible, yeah. And such a thing doesn’t really make sense anyway; the indexer is an implementation detail. It’s not the kind of thing a normal person has an opinion about.

If you’re having trouble with the indexer, the solution is to submit a bug report explaining exactly what’s going wrong. Even if no developers are available to handle it right now, that won’t always necessarily be true.

5 Likes

I had big issues in the past with Baloo and they went away then I basically configured baloo to only index a few of my directory (basically only ~/Documents, ~/Music, ~/Pictures and ~/Downloads). The biggest issue is that the default of indexing everything are terrible.

Ideally we should instead of indexing everything only index by default some files types know to be safe and useful and also probably only index only the ~/Documents, ~/Music, ~/Movies, ~/Pictures and ~/Downloads folders

5 Likes

In my experience, such bug reports lead to the developer asking the reporter to prove that the error is actually attributable to the program. We already see here, in this discussion forum, that the “blame” is readily placed on the user. Even when the Internet is full of reports about the inadequacy of the software and even similar bug reports have been written. As a user, I assume that software errors that I notice as a layman must literally jump into the eye of a developer who understands his craft. In addition, such reports are time-consuming and can become very nerve-wracking. Based on this experience, it has been a good idea to switch to alternatives in the past. If I still address software bugs today, then only with tools that are really worth it in my opinion. Because starting a search with Baloo from any KDE program should sound interesting to you, too.

To take the shortcut now, as suggested, and to cut the desktop search even further, just to conceal possible errors in the development, is wrong in my opinion. Because that would mean giving the user a toy instead of a search engine. But as long as the software is avoided by its own developers, these are only theoretical considerations anyway.

You mean like most search programs already do. Baloo has issues from time to time period. It’s amazing there will be no major issues reported for months then out of the blue just after a update their are tons of reports on Baloo eating too many resources. That screams the program not the end user.

There’s no way to export and import baloo configurations, right? It’d help with diagnosis, so I’m just checking.

It is. The actual configuration is stored in ~/.config/baloofilerc while the actual index itself lives at ~/.local/share/baloo/index. Of course the index will be huge and full of personal information, so sharing it isn’t very practical.

1 Like

FWIW it’s been years since I had an issue with baloo and I index about 200 GB of my home directory (content included):

$ LANG=C balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 100,620
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 5.16 GiB

Of course, I have several directories excluded from indexing, my /home is more than 200GB:

$ df -h /home/
S.ficheros           Tamaño Usados  Disp Uso% Montado en
/dev/mapper/SSD-home   427G   384G   43G  90% /home

So. I’d say that for every bug report/complaining user there must be at least 10 that don’t have problems at all, we just don’t hear from them (and that’s a good thing)

6 Likes

We should not become even more political by declaring every eleventh user an expendable minority. Especially since this assumption is not valid at all, because as I wrote before, it is easier to switch to an alternative than to expose oneself to any stress.

Another point that is disturbing is how people like to draw conclusions about others. Freely according to the motto, if it works for me then everything is fine, the others just do not understand, can not help themselves, cause problems where there are none. The argument “I” and “it works for me” is often used as proof. However, this only works for a small social group in which you are known and your opinion is already valued because of your status in the group. The number of indexed files is also just a spurious argument. You can be personally proud of over 100k documents created, but if someone comes along and opposes that with 120k documents that were not (!) included during indexing, then it can quickly become unpleasant in the community.

In summary, no one has accused the developers of knowingly releasing defective software. On their computers, the software apparently ran just as smoothly as it now runs on the computers of many others.

As far as the configurations of Baloo are concerned, they are displayed to you in the shell by means of “balooctl config list […]”. Apparently some assumes that if the user has specified a directory twice in the configuration, that Baloo considers this directory as two directories.

The strange entries in the syslog do not necessarily inspire user confidence either, here are a few exemplary lines from the more than 6000 of the past 30 days:

/var/log/syslog:2023-05-15T07:13:16.258302+02:00 acht baloo_file[2199]: 661425028609 "/wlf/VA_H\xC3\xB6rspiele/Luis Sep\xC3\xBAlveda, 2002: Tagebuch eines sentimentalen Killers/Tech" renaming "Red Notice (2021).mkv" to "Tech"
/var/log/syslog:2023-05-15T07:13:16.258605+02:00 acht baloo_file[2199]: 670014963201 "/wlf/VA_H\xC3\xB6rspiele/Luis Sep\xC3\xBAlveda, 2002: Tagebuch eines sentimentalen Killers/Tech/Luis Sep\xC3\xBAlveda, 2002: Tagebuch eines sentimentalen Killers-sha256.sum" renaming "The Witcher - S01E01 - The End's Beginning.mkv" to "Luis Sep\xC3\xBAlveda, 2002: Tagebuch eines sentimentalen Killers-sha256.sum"
/var/log/syslog:2023-05-15T07:13:16.317030+02:00 acht baloo_file[2199]: 721554570753 "/wlf/VA_H\xC3\xB6rspiele/Norbert Lang, 2020: How dare you! Echo einer Rede Eine sorgf\xC3\xA4ltig komponierte Kollage zur Klimakrise/Artwork" renaming "13_Hebrew.srt" to "Artwork"
/var/log/syslog:2023-05-15T07:13:16.318662+02:00 acht baloo_file[2199]: 588410584577 "/wlf/VA_H\xC3\xB6rspiele/Thomas Pynchon, 2020: Die Enden der Parabel/Artwork" renaming "Tech" to "Artwork"
/var/log/syslog:2023-05-15T07:13:16.332880+02:00 acht baloo_file[2199]: 622770322945 "/wlf/VA_H\xC3\xB6rspiele/Thomas Pynchon, 2020: Die Enden der Parabel/Artwork/Thomas Pynchon, 2020: Die Enden der Parabel.jpg" renaming "Tech" to "Thomas Pynchon, 2020: Die Enden der Parabel.jpg"
/var/log/syslog:2023-05-15T07:13:16.332956+02:00 acht baloo_file[2199]: 627065290241 "/wlf/VA_H\xC3\xB6rspiele/Thomas Pynchon, 2020: Die Enden der Parabel/Artwork/Thomas Pynchon, 2020: Die Enden der Parabel.jpg.xmp" renaming "Seelen (2013)-sha256.sum" to "Thomas Pynchon, 2020: Die Enden der Parabel.jpg.xmp"
/var/log/syslog:2023-05-15T07:13:16.333034+02:00 acht baloo_file[2199]: 631360257537 "/wlf/VA_H\xC3\xB6rspiele/Thomas Pynchon, 2020: Die Enden der Parabel/Artwork/cover500.jpg" renaming "Cheech and Chong (1978\xE2\x80\x93""1983)" to "cover500.jpg"