How can we get Whisper intagration for voice input?

Zebastjan_Johanzen · March 23, 2024, 9:53am

There’s an excellent open source speech transcription application, called Whisper. However, it’s geared to transcribing subtitles for videos. What we really need is integration with KDE to use voice as an input method. This is an accessibility feature which is lacking.

I tried writing a little python app myself, but working with the audio libraries is a nightmare. For example, there’s no easy way to find the default microphone.

The solution seems to be that this feature would best be implemented by someone who is familiar with the KDE kodebase and knows how to put in the settings under the accessibility tab, the settings for keyboard short cuts, make a little toolbar gizmo, kaj tiel plu.

Perhaps most importantly of all this should be included upstream as it’s an important accessibility feature which as is the case on phones, google-docs, kaj tiel plu, will used by large numbers of people.

I can contribute some money to this project, however, I’m just not familiar enough with KDE programing and all of the libraries and frameworks to do the work myself… also, there’s no way I’d ever go anywhere near c++ code. Not sure how much money would be needed, also as it’s likely more then I afford, if others would chip in.

One last point, we’re talking about a solution which runs on the local machine, rather than being sent to Google, or some other company, which raises privacy concerns. There’s also CMUSpinix, or some other possible back-ends, which might work on more limited hardware, but won’t produce the best results. Depending on system resources whisper can also use smaller models as well.

See: openai com research whisper

jinliu · May 13, 2024, 2:00am

The easiest way to do this is though an IME plugin, like:

I doubt it needs tight integration with KDE Plasma.

lavafroth · June 29, 2024, 10:32am

Hello there! Although I’m not a KDE dev (yet), I’d like to point you to Tempest. Full disclosure: I’m the author of the project. It follows a simple YAML config to map semantically similar sentences to commands.

You can probably hack on it and integrate it more with KDE .