Voice Assistant

I was wondering if someone has thought about bringing a voice assistant into Plasma.

There are plenty of open source solutions both for TTS and STT, with support for many languages. I’m thinking in OpenAI’s whisper for STT and Piper for TTS. Both projects are giving quality results, and creating a voice assistant shouldn’t be a big issue for the great developers at KDE :wink:

Besides Whisper and Piper, there is a quite powerful noise reduction library named RNNoise, and we could use Llama.cpp to bind LLM like Llama or Mistral for answering some questions (we could use chatgpt on mobile devices or old computers).

The idea behind would be to have a voice assistant for the desktop, capable to interact with the installed applications or answer generic questions, for instance:

  • “open firefox”: open firefox
  • “open discuss dot kde dot org”: open your default browser and go to specified url
  • “detect bluetooth devices”: scan for bluetooth devices and let connect them, it can show the desktop wizard as well
  • “set an alarm to 6pm”: An alarm should sound at specified time
  • “set an appointment for tomorrow at 10”: a new entry in korganizer with the subjet specified should be created
  • “open display settings”: open “System preferences” to the specified section
  • “open downloads folder”/“empty recycle bin”: interact with the filesystem
  • “what’s the weather like?”: return your location’s weather
  • “what’s the distance from moon to mars” “tell me a recipe with low fat for today”: answer generic questions via Llama/Mistral/ChatGPT
  • “create a python script to connect to the OpenAI API”/ “create a bash script to create a rsync backup to a network share”: open kate and put the script there (or create a frontend to interact with the AI assistant.
  • “Turn the air conditioner off”: connect HomeAssistant API and control your home.

Possibilities are endless.

1 Like

A few eons ago Mycroft was working on some Plasma integration:

Apparently Mycroft died and the project lives on as OpenVoiceOS, but it doesn’t look like there’s any Plasma integration now

heheh, and more eons ago, there were projects named Simon and ktts or jovie, which let you interact with your desktop with your voice…

I was part of the Mycroft community and I know the OVOS guys as well. I could run my mycroft instance to do some of the tasks I proposed, but there is no real interaction with the desktop.

I would love to say something like: “hello computer”, and see how it recognizes my voice and unlock itself and answers me with a “hello malevolent!”, or “set the display brightness at 80% and temperature to warm white”, or “move all the windows from the current window to the virtual desktop 2”, or “switch the windows between monitors”, “silent telegram notifications the next 2 hours”, stuff like that… having a connector for home assistant or to chatgpt would be a plus, but controlling the desktop via voice would be overkill, imo.

2 Likes

I would like to have a program in KDE similar to Balabolca in windows or something like @voice in Android, that would make me very happy.

I usually use @voice to read for me a lot of documents, PDF, EPUB, etc. I think Linux is a bit behind on this. I 100% support the idea of having a tts engine integrated into KDE and interacting with everything possible in the KDE ecosystem. Piper tts supports many languages, for example mine, Spanish :slight_smile:
If this issue would be solved, it would stimulate me more to switch to Linux. Please, let this ring like bells.

Sorry if I said something wrong, I am not a Linux user yet.

2 Likes

There are several free Linux apps that read documents (PDF, EPUB, TXT) similar to Balabolka. Here are the best options and practical notes:

  1. Speech-dispatcher + frontend reader
  • Speech-dispatcher is the TTS backend used on many distros; it works with various engines (e.g., eSpeak NG, Pico, Festival).
  • Recommended frontends:
  • OKular (KDE) — opens PDF/EPUB files and has a “Read Aloud” feature (uses Speech-dispatcher / eSpeak). Good integration with Plasma.
  • Evince (GNOME) — PDF viewer with support for TTS via plugins or external scripts.
  • Foliate — readable EPUB/PDF reader, integrates TTS (e.g., via gTTS, Speech-dispatcher) and reading controls. Modern interface.
  1. Balabolka-like standalone apps
  • GhostReader / Gespeaker — Gespeaker is a GTK interface for eSpeak/mbrola; it allows you to set the voice, speed, and save audio.
  • eSpeak NG + GUI — Various GUIs exist to facilitate text-to-voice conversion and saving to WAV/MP3.
  1. Modern solutions based on Whisper / local TTS
  • Piper (local neural TTS frontends, for voices like Coqui/TTS) + a player like Foliate or scripts that invoke audio conversion.
  • Workflow: export text from PDF/EPUB (pdftotext, ebook-tools), then play it back with Piper/Coqui TTS or plyr.
  1. Recommended programs and how to use them (quick)
  • Foliate — excellent for EPUB/PDF, has a “Read Aloud” option (installs the foliate and speech-dispatcher/espeak-ng packages).
  • OKular — open document → View → Read Aloud (configure voice in System Settings → Text-to-Speech).
  • Gespeaker — to convert text to audio files and customize local voices.
  1. Installation (typical Debian/Ubuntu commands)
  • Foliate: sudo apt install foliate
  • OKular: sudo apt install okular
  • Speech-dispatcher + eSpeak NG: sudo apt install speech-dispatcher espeak-ng
  • Gespeaker: sudo apt install gespeaker (if available) or build from source.

You’re spoiled for choice: all you have to do is look at the features and choose the one that best suits your needs. :wink: