Crow Translate: Speech synthesis support

Hey, it’s seem like speech synthesis option has been dropped in newer Crow Translate (3.1.1).
Although I click on play text speaking but no sound outs.

I am curious if it is an intention or If I was lacking of something?
I’m using AppImage version of Crow Translate on Ubuntu 22.04.

Btw, fcitx5 also don’t work on input box of CT 3.x.x.
I thought it because of the transition to KDE environment, but is there any workaround?

Some error logs

defaultServiceProvider::requestService(): no service found for - "org.qt-project.qt.mediaplayer"
defaultServiceProvider::requestService(): no service found for - "org.qt-project.qt.mediaplayer"
Error: LSTM requested, but not present!! Loading tesseract.
Error: LSTM requested, but not present!! Loading tesseract.
QHotkey: "Failed to register Ctrl+Alt+E. Error: BadAccess (attempt to access private resource denied)"

Hi! Hmm, I don’t hear any sound output either when I try the Fedora RPM-packaged version of Crow Translate.

This seems likely to be a bug, which would make sense to be filed in the KDE Bugtracking System. The Community Wiki guide to using that to report issues is located here: Get Involved/Issue Reporting - KDE Community Wiki

Run this command to install the necessary dependencies:

sudo apt install qtmultimedia5-dev libqt5multimedia5-plugins libqt5multimediawidgets5

Good morning, thanks for the solution, I installed it but it doesn’t work for me. I use Linux Mint Cinnamon and Crow Translate 3.1.0, installed from app manager with Flatpak. The translation works great, the text capture works great, the only thing that doesn’t work is the audio. Thanks for your help.
P.S. I’m translating with Crow Translate

This is a known issue with the 3.x branch, please ask your linux distribution to ship 4.x or try the appimage or building from source yourself KDE - Experience Freedom!

Without support from the distribution you may get permission errors with OCR though, because .desktop files need to be deployed for KWin to allow arbitrary screenshots.

As to the flatpak - I’ll have to check with the team, I thought they had already updated it to 4.0.2

Good morning, thank you for your reply and I apologize for the delay in replying to your response. I have done a lot of research and research into Crow Translate and there is a lot of confusion in the versions and projects found online. I currently have version 4.0.2 installed on Linux Mint. It performs translations and also OCR recognition, the only thing I haven’t been able to configure is text reading. I downloaded the Italian models Paola and Riccardo (the files *.onnx and *.onnx.jons), I put the two folders Paola and Riccardo in /home/MY_NAME/.cache/piper and in Text-to-Speech I put piper as Provider and in Piper voices path I indicated the location of the folders. Unfortunately it can’t play anything. I made many attempts and tests from the terminal thanks to ChatGpt but I was unable to get Crow Translate to speak. If you can help me I would be grateful, if you need more data let me know. Thank you and happy 2026

I FORGOT THESE ARE THE ERRORS

You need to point to the top level of the directory where you’ve cloned rhasspy/piper-voices · Hugging Face - I don’t know if linux mint has this as a package, you might have to clone the git repository yourself with git clone "https://huggingface.co/rhasspy/piper-voices" . The directory structure must be preserved. If you only want Italian, you’d still have to have the top directory with the README and the voices.json, but remove all the directories other than “it” and all the entries in voices.json other than

    "it_IT-paola-medium": {
        "key": "it_IT-paola-medium",
        "name": "paola",
        "language": {
            "code": "it_IT",
            "family": "it",
            "region": "IT",
            "name_native": "Italiano",
            "name_english": "Italian",
            "country_english": "Italy"
        },
        "quality": "medium",
        "num_speakers": 1,
        "speaker_id_map": {},
        "files": {
            "it/it_IT/paola/medium/it_IT-paola-medium.onnx": {
                "size_bytes": 63511038,
                "md5_digest": "3a44e73b12ca5d0c21a72e388b5847c8"
            },
            "it/it_IT/paola/medium/it_IT-paola-medium.onnx.json": {
                "size_bytes": 7099,
                "md5_digest": "cd471a3757c88a7a4baee6207248b5d5"
            },
            "it/it_IT/paola/medium/MODEL_CARD": {
                "size_bytes": 303,
                "md5_digest": "436971e8acb0a92dd8dbc42542e59d03"
            }
        },
        "aliases": []
    },
    "it_IT-riccardo-x_low": {
        "key": "it_IT-riccardo-x_low",
        "name": "riccardo",
        "language": {
            "code": "it_IT",
            "family": "it",
            "region": "IT",
            "name_native": "Italiano",
            "name_english": "Italian",
            "country_english": "Italy"
        },
        "quality": "x_low",
        "num_speakers": 1,
        "speaker_id_map": {},
        "files": {
            "it/it_IT/riccardo/x_low/it_IT-riccardo-x_low.onnx": {
                "size_bytes": 28130791,
                "md5_digest": "2c564b67f6bfaf3ad02d28ab528929b8"
            },
            "it/it_IT/riccardo/x_low/it_IT-riccardo-x_low.onnx.json": {
                "size_bytes": 4161,
                "md5_digest": "ed24cd550b79acbdc337e519849e9636"
            },
            "it/it_IT/riccardo/x_low/MODEL_CARD": {
                "size_bytes": 260,
                "md5_digest": "3e70f29ab998ac0380edc0cec7395e80"
            }
        },
        "aliases": [
            "it-riccardo_fasol-x-low"
        ]
    },

Ideally this would be handled by your linux distribution. Please check if they have piper-voices as a package. On arch linux I see someone has split this nicely AUR (en) - piper-voices-it-it
It might also work if you don’t edit voices.json and just remove the non-Italian directories - try that first if you don’t feel confident editing the file.

Good morning, I solved the problem, thanks for showing me the way, I’ll summarize everything, or rather Chatgpt will summarize it, because it’s only thanks to this AI that I managed to solve the problem.

:united_kingdom: Summary in English

Problem:
Crow Translate with Piper TTS engine did not play any Italian voice.
Running Piper from the terminal always failed with:


Ort::Exception – Protobuf parsing failed

Root causes:

  1. On Linux Mint, the Piper version available (or previously installed) is too old and incompatible with current Piper voice models.

  2. The .onnx voice files from rhasspy/piper-voices were Git LFS placeholders (~100 bytes), not real models, because git-lfs was not installed.
    A valid model must be tens of MB in size.

Solution:

  1. Install git-lfs:

    
    
  2. sudo apt install git-lfs
    git lfs install
    
    
  3. Remove the previously downloaded piper-voices directory.

  4. Clone the repository again:

    
    
  5. git clone https://huggingface.co/rhasspy/piper-voices
    
    
  6. Verify that .onnx files have the correct size (e.g. ~60 MB for it_IT-paola-medium.onnx).

  7. Install and use an up-to-date Piper binary.

  8. In Crow Translate, set the Piper voices path to the top-level piper-voices directory, preserving its structure.

Result:
After downloading the real models via Git LFS and using an updated Piper version, Piper works from the terminal and Crow Translate successfully plays Italian speech.

I thank you for your precious help and I would like to ask, to add other voice models besides Paola and Riccardo, where do I always find them? A suggestion/advice, Crow Translate is very similar to Qtranslate which I used for Windows, so I would like to suggest a peculiarity of that program if it is possible to do so. When selecting a text with the mouse, when releasing the mouse selection button, a Qtranslate icon appeared, by clicking it a tiny menu appeared where you could choose whether to translate the text, then the mask with the translation opened, or whether to play the text with the voice, finally there were other features that I don’t remember now (possibly with another post I can specify which ones, just start Windows from the boot instead of Linux). Another thing I noticed when using the voice is that several seconds pass between the selection of the text and the request for pronunciation, depends on how long the text is?, the problem is that Piper has to create the wav file and then play it? I’ve also noticed other small issues with the reading speed, which seems too fast, making it seem like the words are being swallowed. The tone and pacing could also be improved. Thanks again, and I apologize if this post has become so long.