Konsole vs. Umlauts

Hi,

I’m a very happy new KDE user coming from years of using my dear Cinnamon DE. KDE is very cozy, polished and well integrated and I only have one tiny problem - I can’t seem to get Konsole to display files with umlauts correctly:

$ touch täst
$ ls
't'$'\303\244''st'

I’m running a current Manjaro KDE with all relevant encodings set to UTF-8. Umlauts work well in all other programs. I opened a thread in the Manjaro forum (no links allowed here) when I still thought it’s a deeper problem (LibreOffice recent documents first seemed to have a problem too but don’t). I tried all kinds of settings like /etc/locale.gen, regional settings in system settings (logging out and in each time) and chosing a different font but have not had any success so far.

Any ideas?

Works for me:

image

Check your LANG environment variable. If it is set to something that isn’t UTF-8, for example the default Unix locale “C”, you get the output you described:

image

This is something that your operating system should have set correctly during setup, but if it failed to do that, you can try to edit the file /etc/locale.conf to set LANG=en_US.UTF-8 - or anything else you find appropriate that ends with .UTF-8 that is listed in /etc/locale.gen. If the language that you want is listed in /etc/locale.gen with a comment symbol (“#”) in the front, then you’d also want to uncomment it (remove the “#” symbol) in the /etc/locale.gen file, and the run (in your terminal) sudo locale-gen.

If you want to change the language just for your own user session, you can edit the file .profile in your home directory (known as ~/.profile) and add this command at the end:

eval $(LANG=en_US.UTF-8 locale)

Where you replace en_US.UTF-8 with whatever locale you want, as per the previous instructions.

After you make a change, if you changed /etc, then reboot or if you only changed your home folder - log out and log back in.

2 Likes

Thanks a lot for the detailed instructions! It’s very comforting to see it working on your machine so I will get there too eventually.

I did try all this before and everything seems according to plan to me:

$ echo $LANG
en_GB.UTF-8

$ cat /etc/locale.conf 
LANG=en_US.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_PAPER=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_TIME=de_DE.UTF-8

$ grep -v "#" /etc/locale.gen
de_DE ISO-8859-1  
de_DE@euro ISO-8859-15  
de_DE.UTF-8 UTF-8
en_US.UTF-8 UTF-8
en_GB.UTF-8 UTF-8

I don’t have a .profile and .bash_profile and .bashrc don’t contain anymore variable settings regarding LANG.

Could any other file interfere and set something non-UTF-8ish?

A couple of things come to mind:

  1. Maybe your profile is set correctly, but for some reason Konsole is being run with incorrect environment? Check if the output of this command makes sense: cat /proc/$(pidof konsole)/environ | tr '\0' '\n' | grep LANG

  2. Maybe your file system doesn’t support UTF-8? Seems unlikely but worth a check - try running findmnt -n --target "$(readlink -f .)" where you are having problems with UTF-8 file names, and drop the result here, so I can take a look.

2 Likes
$ cat /proc/$(pidof konsole)/environ | tr '\0' '\n' | grep LANG
LANG=en_GB.UTF-8
LANGUAGE=en_GB

$ findmnt -n --target "$(readlink -f .)"
/ /dev/nvme0n1p2 ext4   rw,relatime

The LANGUAGE variable is missing “.utf-8”. Not sure if that matters? And I hope ext4 doesn’t have an option where it doesn’t support UTF-8. I had read somewhere to use “,utf-8” in the mounting options in /etc/fstab which I tried unsuccessfully as well.

The ext4 file system does support unicode, and doesn’t need additional mount options. That was a long tree to climb on, and we probably shouldn’t have climbed on it. I was worried about some non-standard mount options or a weird file system, but that isn’t the case here.

The LANGUAGE environment variable doesn’t have the character encoding, and that is how it should work, so that’s fine as well.

Can you show the output of the command locale?

Oh wow there is something I missed so far or it wasn’t there until now:

$ locale
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY=en_DE.UTF-8
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=en_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=

I think this is your problem, there aren’t such locales in Arch (and by inheritance I would 99.999% assume in MJ as well). Go back to SettingsRegion & Language and choose de_DE for those, plain German (Deutschland) and not English (German). Relogin and check.

3 Likes

:heart_eyes: Oh man, I looked at those 100 times and didn’t see it. Of course! You can set “English (Germany)” or “German (Germany)”… I really didn’t see the difference and I guess just went by the flag. But of course that was it:

$ touch täst
$ ls
täst

Thank you guys so much for your great help! I appreciate my now quite perfect setup a lot :star_struck:

2 Likes

I just checked my Debian 11 VM and there’s no such locale there either /usr/share/i18n/SUPPORTED, I think it is an invalid locale to begin with and don’t know why it exists in KDE. Maybe a bug or worth taking a look by some dev.

You are right. I will file a bug. Thanks so much again!

1 Like

BTW - if you want an en_DE locale - if it makes sense for you - its not hard to do. For example I use the en_IL locale - which is a mix of American English spelling, mix of European and British measurements, Hebrew Israeli currency, and European style, but not exactly date and time.

Making a mix of Locales is not that hard - review /usr/share/i18n/locales/en_IL to see how it was done.

2 Likes

Ah that’s very cool! In my case it would, for instance, result in the weekdays being displayed in English language but with German date formatting (dd.MM.yyyy instead of MM/dd/yyyy). In your case you probably get a mix of latin and hebrew letters in addition, which, I’m not even sure how that works if one if left to right and the other right to left. Interesting, thanks a lot for the input :pray:

In my (en_IL) case the text is all in English, using custom time/date/address rules from Hebrew with English names; messages and names are otherwise American English; characters and collation are American British and everything else is copied from he_IL - which is all numbers and symbols stuff that doesn’t need translation.

2 Likes