How do I troubleshoot Baloo?

I added an album to my collection and noticed that Elisa only recognised some of the songs. When I changed indexer to scanning the file system directly all of the tracks were indexed just fine. When I changed back to using Baloo the same files disappeared from my collection.

Then I went into the music folder and ran balooctl index "08 Watery Graves (slow).flac", refreshed Elisa’s database, and saw the file appear in Elisa. Running balooctl check does not find the files, so I can only assume this is a bug in Baloo. balooctl failed simply outputs All Files were indexed successfully. Is there any way for me to figure out why Baloo is choosing to not index these files?

I suggest you use balooctl to disable, purge, and enable to get a fresh index. Then, look at the “folder specific configuration” to add directories that aren’t under your home directory.

And then, balooctl --help will tell you what you can do.

I found that balooshow -x <file> exists. When I run this for files that are shown in Elisa I get the line Property Terms: Maudio Mflac T2 whereas the files that aren’t showing in Elisa have the line Property Terms: Mapplication Moctet Mstream.

I’ll try purging and re-indexing to see if that fixes the issue (wanted to keep this as a last resort in case it takes ages to scan everything)

Yep purging it fixed the issue. I was hoping to figure out why baloo didn’t scan the files properly the first time around, but oh well. My only guess is that it scanned them as they were being unzipped from an archive and didn’t look like music files at the time :man_shrugging:

I claim baloo is broken by design because users aren’t allowed to see into the index, or have details of what it is doing. It fails in so many ways, sort of understandable given the open-ended variety of file formats it handles, but by design users can’t work out what is wrong and so give good reports of failures. Some years ago baloo tried to kill my desktop by exceeding the TBW of the SSD; if I hadn’t noticed it would only have taken a couple of weeks to exceed the TBW. Now, I don’t let it run.

I read something similar in another post (I don’t remember if it was you or someone else) but baloo does have some debugging tools, one offered right here by th OP.

There’s:

balooctl status
balooctl check
balooctl index
balooctl clear
balooctl monitor
balooctl indexSize
balooctl failed
balooshow -x [file|id|inode...]

Edit: meant to add: I know it’s not perfect but it’s also not a blackbox as you seemed to imply.

If it happens again you coud try balooctl clear </path/to/file> and balooctl index </path/to/file> before purging everything.

I’m not sure, but I think that when I last tried to make baloo work (my notes suggest 2020) there were not so many balooctl commands, and there was no balooshow. After I found it stuck in a loop writing continuously I haven’t let it run since.

(Sorry to come in late.)

Let me put in a plug for the Baloo page on KDE community Wiki, in particular the section Indexing limitations. If you can figure out why it only indexed some music files, file a bug and/or note it in that section.

As you surmise, it sounds like baloo did “index” your files, but didn’t think some of them were music files the first time. I’m not sure what Baloo checks to decide whether it needs to reindex a file: size, modified time, ?? I would think that something changes after a file is completely extracted.

If it happens again, as that wiki page says, use kmimetypefinder5 myMusicFile.flac to see what mimetype each music file is. If KDE doesn’t think a FLAC files is of type audio/flac, then the problem isn’t in Baloo.

If KDE thinks they are flac files, Baloo actually uses another Frameworks library kfilemetadata to extract information from files. In many cases that in turn relies on other libraries like exiv2 and Poppler, and (painfully) the unmaintained catdoc programs for extracting info from old Office files (oblig. xkcd reference xkcd 2347 “Dependency”). But I think for flac files kfilemetadata uses its built-in taglibextractor that gets tag info from flac files like album, artist, releaseYear, etc.)

I suspect one problem with Baloo is it doesn’t report some errors and failures from kfilemetadata’s indexers. I’ve never seen output from balooctl failed. One thing that would help is if distros shipped the dump utility from kfilemetadata, which shows what info kfilemetadata extracts from a file without going through all the layers.

1 Like

Thanks, I think the issue was Baloo indexing a file before the file was totally complete, e.g. indexing whilst being extracted from an archive or being ripped from a CD. When I rescanned the files Baloo correctly detected them as FLAC files. If this hypothesis is correct, then maybe Baloo should only scan a file if it has not been modified for X period of time, e.g. 1-2 seconds.

I know the issue is not in KFileMetadata or TabLib since the files appeared in Elisa when using the built-in indexer and Elisa uses KFileMetadata to read music tags.

I confirm this problem when I extract any downloaded JDK or JRE tar, or any zipped git folder.

The solution I follow is to avoid extracting those files inside baloo watched directories, and to refresh baloo index from time to time by disabling it and purging its index then restart it.

File a bug please. I couldn’t reproduce. I unzipped a 2.6GB zip file of 81 flac files into an indexed directory, and baloo immediately started indexing the file contents and within a few minutes baloosearch could find the files searching by metadata and balooshow -x showed the music file tag information. I had turned on baloo debug output and the journal indicated a dozen files were indexed twice by different baloo_file_extractor processes, but all the info was there.

If you had an incredibly large zip file or very slow unzipping it could happen. But there are so many variables with file systems (I’m on btrfs), zip programs, file contents, how busy your computer was during unzip and indexing, etc. it’s hard to know what’s key to causing the bug.

I don’t remember, but I’m sure I filled some bugs like these, I will try to find them. I suspect the main problem is that when the tree structure of a zipped folder has too many sub-folders, baloo will do something bad.

From my experience using KDE Plasma, I can surely say baloo and akonadi are the only two components not totally safe for long time usage, they will break at some point, and they always need a fresh reset.

and I replied

But I had a crazy similar problem while trying to reproduce a different bug. I copied a .txt file from an NTFS file system partition to a btrfs partition, and terms in the copy weren’t indexed.

% balooshow -x ~/Documents/temp/Register_start_utf8.txt
bb556218b57ee 562780142 767318 /home/spage/Documents/temp/Register_start_utf8_copy.txt
        Mtime: 1701166752 2023-11-28T02:19:12
        Ctime: 1701168951 2023-11-28T02:55:51
Internal Info
File Name Terms: Fregister Fstart Ftxt Futf8
XAttr Terms:
Plain Text Terms:
Property Terms: Mapplication Mx Mzerosize

No terms from the text file’s contents, and note the property “Mzerosize”; I have never noticed this before in balooshow output. Skimming the baloo source code, I think a file gets this attribute if it should be indexed and its QFileInfo st_size is 0… but this file is not size zero! I think I had to balooctl clear and then balooctl index to get the file to index properly.

So if a file’s contents aren’t indexed, see if balooshow -x shows this bogus Mzerosize.

These bugs are heck to reproduce.

For debugging, see also Igor Poboiko’s baloo-checkdb.py

There’s a description here, under:

Installation instructions vary with your distribution but look something like:
https://bugs.kde.org/show_bug.cgi?id=472197#c2

There’s a recent fix for the cases where baloo_file_extractor crashes:

These previously didn’t always result in the file being flagged “failed”, it depended on how the core dumps were being handled.

Edit: Found the bug report:
https://bugs.kde.org/show_bug.cgi?id=421317#c6