Spell Check -- Deep Dive

so i can see that kate uses ~/.hunspell_en_US when you tell it to add a new word to the dictionary.

but it adds NINE copies of each word for some reason

if i try just pasting a word into the pain text file and saving it, then reloading all the open files in kate, it is still underlined in original file where it was copied from like the word was not added.

is the only way to add words to this file by using the menu “add to dictionary” feature? and what about the all the duplicates? do those get purged at some point?

and this only for kate… there are separate user defined dictionaries all over the place for all different kinds of apps (firefox, libre office, onlyoffice, vscodium… etc).

how do you all deal with this?

UPDATE: for kate to receive the new dic file words that are just pasted into the file, it is required to relog into a brand new session.

1 Like

since no one else seems to be as reliant as i am on spell check, i went on a deep dive and came up with the following writeup on my adventure.

hopefully it benefits someone besides me.

Spell Check – Dictionary Maintenance

When a user makes a dictionary addition via the right click “Add to Dictionary…” menu option, there is no single place where these additions are saved. Spell check dictionaries in linux and kubuntu in particular are spread out across various directories in a users /home folder. They are associated with the application the user was using at the time they added the word.

The result is often a confusing array of user dictionaries that may or may not contain an added word depending on which application is in use. To coordinate these is manual process that is outlined here.

Tools Needed

  • Meld for comparing and merging files
  • Kate for viewing and manipulating file contents
  • Dolphin for file management and navigating the directory tree
  • sort command line tool for alphabetizing a file full of words and removing duplicates
  • find, printf, tail commands for finding user dictionaries

Identifying User Dictionaries

The most likely applications for me to find added words are the browser Firefox or editor Kate, since that is where I do most of my typing, followed by my preferred markdown editor Typora. Other sources of new words could be from any of the office suites installed on my machine like Libre Office, Onlyoffice, WPS2019 or even code editors like VSCodium, though program language terms might be more of a technical nature and thus poor candidates for merging with the rest.

The first step is to identify the file where added words are saved for each of the applications you consider the major sources of new words. To this end a handy script finding the last 10 files changed on your machine is:

find -type f -printf "%T@ %p\n" | sort -n | tail -10

Run this immediately after using the “Add to dictionary…” feature in an application and you should be able to identify the file that was changed. Make note of this location and/or make a bookmark in Dolphin so you can easily get back to it in the future.

Dictionary File Differences

The main two dictionary files that interest me are fortunately stored as plain text files without a header, just a simple list of words. This makes it easy to sort, remove duplicates, as well as cut and paste into other dictionary files, as need be. It’s also possible to link these files together so they act as one where additions from one application will appear in the other after a logout.

When a dictionary file uses a header of any kind (even if its just numerical count of the number of words), you cannot just sort these without consequence. Though the purpose of the numerical count seems unclear since mismatch between the count and the number words does not always seem to matter.

The outlier in my case was Typora which stores it’s added words in .json format. So adding words to this file requires a bit of reformatting using Find and Replace before a cut and paste operation can done. For this operation, a copy is made of the source dictionary in case any find and replace work goes sideways.

Another consideration is not all these applications will necessarily be using that same starting point for their standard dictionary, therefore some may already contain words that needed to be added to others.

Comparing and Merging Dictionaries

The plain text (no header) formats are the easiest to maintain since tools like Meld can be used to zipper together separate collections that may or may not have significant overlap. However Meld works best when the files are ordered for easy visual reference. To that end, a sort command is useful to create a common starting point:

sort -u /path/to/dictionary -o /path/to/dictionary

This removes any duplicate entires (-u) and then sorts the file alphabetically before writing it back to the original file (-o). Now with both dictionaries similarly prepared, they are ready for the compare operation and synchronization if desired.

An easy way to check completeness of a dictionary with kate is to make sure kate’s dictionary has the most comprehensive list of words, re-log to ensure the dictionary files are read, then open one of the other dictionary files in kate. Any missing words will be underlined and easy to spot. This is especially helpful when the dictionary is in .json format where a meld comparison would be useless.

Location examples

Kate (and most KDE apps)

~/.hunspell_en_US

Firefox (snap)

~/snap/firefox/common/.mozilla/firefox/[user].default/persdict.dat

Onlyoffice (flatpak)

~/.var/app/org.onlyoffice.desktopeditors/data/onlyoffice/desktopeditors/data/dictionaries/all/all.dic

1 Like