I added an album to my collection and noticed that Elisa only recognised some of the songs. When I changed indexer to scanning the file system directly all of the tracks were indexed just fine. When I changed back to using Baloo the same files disappeared from my collection.
Then I went into the music folder and ran balooctl index "08 Watery Graves (slow).flac", refreshed Elisa’s database, and saw the file appear in Elisa. Running balooctl check does not find the files, so I can only assume this is a bug in Baloo. balooctl failed simply outputs All Files were indexed successfully. Is there any way for me to figure out why Baloo is choosing to not index these files?
I found that balooshow -x <file> exists. When I run this for files that are shown in Elisa I get the line Property Terms: Maudio Mflac T2 whereas the files that aren’t showing in Elisa have the line Property Terms: Mapplication Moctet Mstream.
I’ll try purging and re-indexing to see if that fixes the issue (wanted to keep this as a last resort in case it takes ages to scan everything)
Yep purging it fixed the issue. I was hoping to figure out why baloo didn’t scan the files properly the first time around, but oh well. My only guess is that it scanned them as they were being unzipped from an archive and didn’t look like music files at the time
I claim baloo is broken by design because users aren’t allowed to see into the index, or have details of what it is doing. It fails in so many ways, sort of understandable given the open-ended variety of file formats it handles, but by design users can’t work out what is wrong and so give good reports of failures. Some years ago baloo tried to kill my desktop by exceeding the TBW of the SSD; if I hadn’t noticed it would only have taken a couple of weeks to exceed the TBW. Now, I don’t let it run.
I’m not sure, but I think that when I last tried to make baloo work (my notes suggest 2020) there were not so many balooctl commands, and there was no balooshow. After I found it stuck in a loop writing continuously I haven’t let it run since.
Let me put in a plug for the Baloo page on KDE community Wiki, in particular the section Indexing limitations. If you can figure out why it only indexed some music files, file a bug and/or note it in that section.
As you surmise, it sounds like baloo did “index” your files, but didn’t think some of them were music files the first time. I’m not sure what Baloo checks to decide whether it needs to reindex a file: size, modified time, ?? I would think that something changes after a file is completely extracted.
If it happens again, as that wiki page says, use kmimetypefinder5 myMusicFile.flac to see what mimetype each music file is. If KDE doesn’t think a FLAC files is of type audio/flac, then the problem isn’t in Baloo.
If KDE thinks they are flac files, Baloo actually uses another Frameworks library kfilemetadata to extract information from files. In many cases that in turn relies on other libraries like exiv2 and Poppler, and (painfully) the unmaintained catdoc programs for extracting info from old Office files (oblig. xkcd reference xkcd 2347 “Dependency”). But I think for flac files kfilemetadata uses its built-in taglibextractor that gets tag info from flac files like album, artist, releaseYear, etc.)
I suspect one problem with Baloo is it doesn’t report some errors and failures from kfilemetadata’s indexers. I’ve never seen output from balooctl failed. One thing that would help is if distros shipped the dump utility from kfilemetadata, which shows what info kfilemetadata extracts from a file without going through all the layers.
Thanks, I think the issue was Baloo indexing a file before the file was totally complete, e.g. indexing whilst being extracted from an archive or being ripped from a CD. When I rescanned the files Baloo correctly detected them as FLAC files. If this hypothesis is correct, then maybe Baloo should only scan a file if it has not been modified for X period of time, e.g. 1-2 seconds.
I know the issue is not in KFileMetadata or TabLib since the files appeared in Elisa when using the built-in indexer and Elisa uses KFileMetadata to read music tags.
File a bug please. I couldn’t reproduce. I unzipped a 2.6GB zip file of 81 flac files into an indexed directory, and baloo immediately started indexing the file contents and within a few minutes baloosearch could find the files searching by metadata and balooshow -x showed the music file tag information. I had turned on baloo debug output and the journal indicated a dozen files were indexed twice by different baloo_file_extractor processes, but all the info was there.
If you had an incredibly large zip file or very slow unzipping it could happen. But there are so many variables with file systems (I’m on btrfs), zip programs, file contents, how busy your computer was during unzip and indexing, etc. it’s hard to know what’s key to causing the bug.
I don’t remember, but I’m sure I filled some bugs like these, I will try to find them. I suspect the main problem is that when the tree structure of a zipped folder has too many sub-folders, baloo will do something bad.
From my experience using KDE Plasma, I can surely say baloo and akonadi are the only two components not totally safe for long time usage, they will break at some point, and they always need a fresh reset.
No terms from the text file’s contents, and note the property “Mzerosize”; I have never noticed this before in balooshow output. Skimming the baloo source code, I think a file gets this attribute if it should be indexed and its QFileInfo st_size is 0… but this file is not size zero! I think I had to balooctl clear and then balooctl index to get the file to index properly.
So if a file’s contents aren’t indexed, see if balooshow -x shows this bogus Mzerosize.