KDE support for Compute Use by AI Agents?

That won’t be a controversial title

Ever since migrating to Wayland, I personally have struggled to automate keyboard/mouse clicks to the same extent as X11. I’ve used ydotool (and a couple others), but desktop automation on Wayland are just more limited and buggier than X11.

Botting games and accessibility got us some desktop automation, but it’s not yet utopia. I’m wondering if AI agents has made the desktop automation a stronger use-case.

I’m not tooo familiar with any KDE roadmap/in progress work; could someone fill me in on if there’s any discussion/upcoming improvements for APIs/something else for this? (i.e open this window, send these keystrokes, click and drag from XY to XZ..)

<3 y’all, apologies in advance if this post strikes a nerve, I haven’t been around lately so idk if this has been discussed to death

Yes, automation under Wayland is lacking but slowing improving. But I doubt AI would provide stronger motivation. On the contrary, since AI has visual capabilities, all it needs is a screencast and control of keyboard and mouse, which is already there with remote desktop protocols.

For example, an AI screen reader might read any text on screen without the app supporting AT-SPI2.

1 Like

Do you know what part of the stack needs the development? Is it like the protocol? Some sort of plumbing? The interface is there, people just needing to make apps?

There’s no universal protocol for a tool to, say, move a window. So currently you have to use private protocols for GNOME, KDE and wlroots-based compositors, separately.

1 Like

Indeed, though accessibility information will still be valuable even with good “reading” capability.

An AI can also interact with other software on an programmatic level, e.g. calling D-Bus methods, preparing a config/profile of an app for a specific task and running that, etc.

It doesn’t necessarily need go the crude way of using UI :slight_smile:

AI is NEVER the answer.

you can use input-remapper to help fill the gaps

i made a simple script to help write the macro code you need to insert if you just want a keyboard shortcut to deliver a text string.

macro generator script
#!/bin/bash
#uncomment the next line for diagnostic help.
# set -x

#this will take written input and write an input-remapper macro to produce
#the exact same string which can be mapped to a keyboard shortcut

textIn=$(kdialog --textinputbox "write the desired macro output" "Hello, World!")
len=${#textIn}
macro=""

for (( i=0; i<$len; i++)) ; do

    ch=${textIn:$i:1}

#covers upper&lower case and a few punctuation characters, or exits if not listed
    case $ch in
        [a-z] | [0-9])
            macro="$macro key($ch)."
            ;;
        [A-Z])
            macro="$macro modify(shift_L,key($ch))."
            ;;
        \`)
            macro="$macro key(KEY_GRAVE)."
            ;;
        \~)
            macro="$macro modify(shift_L,key(KEY_GRAVE))."
            ;;
        !)
            macro="$macro modify(shift_L,key(1))."
            ;;
        @)
            macro="$macro modify(shift_L,key(2))."
            ;;
        \#)
            macro="$macro modify(shift_L,key(3))."
            ;;
        $)
            macro="$macro modify(shift_L,key(4))."
            ;;
        %)
            macro="$macro modify(shift_L,key(5))."
            ;;
        ^)
            macro="$macro modify(shift_L,key(6))."
            ;;
        \&)
            macro="$macro modify(shift_L,key(7))."
            ;;
        \*)
            macro="$macro modify(shift_L,key(8))."
            ;;
        \()
            macro="$macro modify(shift_L,key(9))."
            ;;
        \))
            macro="$macro modify(shift_L,key(0))."
            ;;
        -)
            macro="$macro key(KEY_MINUS)."
            ;;
        _)
            macro="$macro modify(shift_L,key(KEY_MINUS))."
            ;;
        =)
            macro="$macro key(KEY_EQUAL)."
            ;;
        +)
            macro="$macro modify(shift_L,key(KEY_EQUAL))."
            ;;
        [)
            macro="$macro key(KEY_LEFTBRACE)."
            ;;
        {)
            macro="$macro modify(shift_L,key(KEY_LEFTBRACE))."
            ;;
        ])
            macro="$macro key(KEY_RIGHTBRACE)."
            ;;
        })
            macro="$macro modify(shift_L,key(KEY_RIGHTBRACE))."
            ;;
        \\)
            macro="$macro key(KEY_BACKSLASH)."
            ;;
        \|)
            macro="$macro modify(shift_L,key(KEY_BACKSLASH))."
            ;;
        \;)
            macro="$macro key(KEY_SEMICOLON)."
            ;;
        :)
            macro="$macro modify(shift_L,key(KEY_SEMICOLON))."
            ;;
         \')
            macro="$macro key(KEY_APOSTROPHE)."
            ;;
        \")
            macro="$macro modify(shift_L,key(KEY_APOSTROPHE))."
            ;;
        ,)
            macro="$macro key(KEY_COMMA)."
            ;;
        \<)
            macro="$macro modify(shift_L,key(KEY_COMMA))."
            ;;
        .)
            macro="$macro key(KEY_DOT)."
            ;;
        \>)
            macro="$macro modify(shift_L,key(KEY_DOT))."
            ;;
         /)
            macro="$macro key(KEY_SLASH)."
            ;;
        \?)
            macro="$macro modify(shift_L,key(KEY_SLASH))."
            ;;
         ' ')
            macro="$macro key(space)."
            ;;
        *)
            echo "$ch is not allowed"
            exit 1
            ;;
    esac

    ((n++))
    if [[ $n -gt 3 ]] ; then
        macro=$macro"\n"                #keeps output to reasonable line lengths
        n=0
    fi

done

macro="$macro key(KEY_ENTER)"

kdialog --textinputbox "copy the result and paste it\ninto input-remapper's output box" "$macro"

This is layout-dependent, I think?

For example, this assumes that Shift' gives ", which is true for US English but not for many other layouts.

        \")
            macro="$macro modify(shift_L,key(KEY_APOSTROPHE))."
            ;;

yes, you need to use the the shift modifier to access the shifted character, if that’s how your keyboard is set up.

you can also use other modifiers, including the Alt-Gr to access additional characters on your keyboard layout.

for instance i have Alt-Gr set up as the right-alt key so i could use modify(alt_R,key(KEY_APOSTROPHE)) to generate the character on my layout that looks like this

i would need modify(alt_R,modify(shift_L,key(KEY_APOSTROPHE))) to get to the character.

I’ve used claude-code and codex both extensively in kde now, and while nothing directly to support kde, it has no problem rooting around your files to do stuff and using any services you have over network or unix sockets one way or another.

I have Ollama for local things trying to keep it inside/secure, but tools use is pretty abysmal still in general for most models, and haven’t really figure out how to get it talking reliably to my personal MCP server project trying to bridge other gaps, such as actually using the desktop ala kwin-mcp (NOT mine).

You really should use a sandbox with external agents, I mostly use Anthropic’s SRT until I mostly just made my own using bubblewrap to do what it does with more function, but I’m fairly comfortable using it outside the sandbox as well now when I want it to see my entire filesystem for diagnosing things. NSJail, firecracker vm, docker/podman are other options for isolation, but you’ll need to figure out getting file and socket access to things you do need them to talk to.

I use AI a lot these days, everything from hunting down wastes of space to fixing tmux issues to (re)writing my own kde/browser widgets and everything in between. I am not a developer, more network/security/infrastructure, so I speak BGP and firewall policy more than I do Python, and this finally let’s me apply my 25+ years experience actually making myself useful software now I wouldn’t normally have. I’ve also been using it to make network automation and ETL work for migrations from this vendor gear to that vendor gear, which before was a fairly manual and arduous task.

To just call all AI bad is just naive. Blame the bad things on bad people doing bad things with it. Like guns.

Now is it wrecking the planet boiling oceans literally and economies with its thirst for Ram? Sure, but humans were well on the way to destroying the planet and each other before AI, and will continue to do so after. You can’t put the rabbit back in the bag now, and for me, I’ll take the help.

1 Like