Questions about UI automation on KWin Wayland

I’m sorry, I don’t have much experience with Wayland, but with Plasma 6 coming out, and X11 no longer being supported on Fedora, I have a few questions and concerns regarding certain features of X11 I have grown to depend on.

How would one handle UI automation with Wayland? Things like auto-clickers (detecting some graphics on screen, moving the mouse pointer to it and clicking), auto keypresses, keylogging, software like Xeyes? How can clients get global mouse position, position of other windows, events in other applications? For example, can a client send a keypress to another client, simulating the user’s keypress (the other client not being aware of that)?

All of that is trivially simple with X11.

As far as I know, Wayland does not implement such protocols, but leaves it to the compositors.

Of course, there are potential security risks (keylogging is most often used maliciously, for example), so such clients would ideally have to be registered in Kwin by the user to be given such power. But functionality of such software is simply too great to forbid it entirely. X11 is beautifully hacky, it allows for very clever solutions, stuff like automated simulation of user interaction via scripting… For example, one can implement a chat bot as a Bash script that works with pretty much any client, even proprietary ones. I fear I will miss such functionality greatly once I finally decide (or am forced) to switch to Wayland.

For example, I really like Xeyes :eyes: – it is not only a fun little gimmick (which I’ve had in one form or another on all my personal computers since the '80s, including Unix, Amiga, Windows, and Linux) – but also a useful tool: having blindspots in my field of vision, and severe myopia, I got used to it helping me find the mouse pointer on multiple monitors, and it’s much nicer than any pointer highlighting effect that has to be manually triggered.

Is anything like that implemented or planned with Kwin Wayland?

I hope that there will be a way to get all those exciting features of X11 on Kwin Wayland, but of course, in a more secure way, so that the user has the ultimate control of what kind of spying the clients are allowed to do.

1 Like

Until such a thing exists natively on Wayland, there’s a relatively simple workaround: run apps you want to be snooped on in this way through XWayland, and change the XWayland security policy in System Settings > Applications to allow all XWayland apps to snoop on each other.

1 Like

Yes, that is a somewhat useful workaround, however, that still is not enough to make a program like Xeyes (or some alternative) work properly, unfortunately.

There’s also an option to do UI automation via the existing accessibility APIs; see SDK / Selenium Webdriver using AT-SPI · GitLab and writing tests · Wiki · SDK / Selenium Webdriver using AT-SPI · GitLab. This is a very new and fast-moving project, so it’s not something that’s currently exposed via a user-friendly app. But it might be a fruitful direction to look in.

Beyond that, I’m afraid the more general answer is “get involved in Wayland protocol development and Wayland compositor development to propose and implement what you want.”

But let’s also be serious for a moment: Xeyes is not exactly a true blocker. :slight_smile:

2 Likes

eyes
I wish more people took it seriously… Eyes application has been a part of the desktop interface from the very beginning. Even before X11.

Hmm, I don’t think I’ve ever seen a system in the wild with Xeyes on it, and none of mine have either.

Focusing on frivolity like Xeyes distracts from real Wayland problems that still need to be solved, IMO.

It’s not really about Xeyes, at least not only about Xeyes – think of Xeyes as a demonstration of what could be done. Getting global screen coordinates is very useful for countless other tasks regarding UI automation.

And it’s not just a frivolity, it is actually has a use. Like I said, it helps me find the mouse pointer on multiple monitors, since I have blind spots in field of vision and severe myopia. But again, Xeyes is not the main issue here, it’s the inability for me to easily get the global position of the mouse, to easily detect and emit keypresses globally, etc…

Thinking about X11 being phased out in favour of Wayland gives me anxiety, everything is so locked up, your own desktop does not even let you program a keylogger. I should be able to do that if I want (and suffer the consequences if I’m not careful). The whole philosophy of it treating the user as too stupid to be trusted with global position of the mouse cursor, I find it defective by design.

What I really want is an ability to write a program in C in less than a couple hundred lines of code, that emits and listens to keypresses, moves the mouse pointer around, clicks on specified coordinates, gets information about windows, etc… Xeyes is just a very familiar and iconic symbol of that.

I don’t find that it’s useful for every discussion about something currently not possible on Wayland to devolve into an essay about Wayland itself. Yes, it’s different, yes, it’s not possible to to everything you used to do on it, but it also makes it possible to do a lot of things that are not possible–either easily or at all–on X11.

I get that there are still gaps in Wayland, but that’s why we need experts (like you!) to help fill them. So we’re back to “get involved in Wayland protocol development and Wayland compositor development to propose and implement what you want.”

Oh, I wouldn’t even know where to start. I have absolutely no experience with Wayland.

I just know it’s not about the Wayland protocol, that will never be allowed, as it is contrary to Wayland philosophy. It should be implemented in Kwin as an extension to Wayland, with some safeguards. What exactly would that involve, I have no idea.

I was hoping that posting here would get attention of someone working on Kwin, and that I would get some reassurance that what I am talking about is not completely crazy and off the wall, and that the user should have the freedom to bypass security features at their own risk.

Oh, I wouldn’t even know where to start. I have absolutely no experience with Wayland.

Well, maybe that’s part of the problem. Wayland has been around for 10 years now. Might be time to learn something new. If I can do it, so can you!

the user should have the freedom to bypass security features at their own risk.

Yeah, you already see that on other platforms: you launch an app and it shows you a dialog saying, “Please follow these instructions to abuse the accessibility API and grant my app wide permissions so it can behave normally.” It’s quite a terrible UX that annoys and confuses users and amounts to undoing the work done to make the system secure.

What’s more likely–and also more useful–is to allow apps to opt into specific bits of elevated functionality that could potentially be dangerous: “snoop keyboard activity”, “snoop mouse position”, “use webcam” and so on. These requests would be facilitated by the compositor and the portal system to show the user an approve/deny dialog, remember settings per app, and also change the security settings of apps after the fact, if needed. Basically what you see in iOS and Android. And we already have it for tons of things like screenshots, screen recording, global shortcuts, etc.

But for a thing that’s not currently supported, it requires the hard work of proposing a protocol, shepherding it through the protocol review process, and them implementing the needed support in your favorite compositor and portal implementation. It’s all possible, and things like this are happening constantly in the background, but it does take time and expertise.

So I totally get that if you’re an app developer, your scope of care is your app; you want the system to facilitate what you want it to do. You don’t want to become a Wayland developer and wait 5 years just to be able to position popups or snoop the keyboard or whatever. And you think “of course the user trusts me and my app; why else would they be using it?” But this attitude is what got us into the mess of X11 where any change to the X server broke scores of important apps and basically killed X development. That’s the hidden story of why Wayland exists. X11 is at a point where its poor architecture and library of apps doing creative things basically prevents development and holds back the entire platform. So that’s why X11 doesn’t have per-screen scaling, variable refresh rate, HDR, and so on.

Obviously, that’s not ideal either.

So the Wayland transition involves as much of a mindset shift as it does a technical shift. App developers need to start talking more to their upstreams when they encounter something they think they can’t do, rather than finding a creative app-specific hack or shipping an unmaintained patched fork of half the system libraries.

Ultimately something like a “click here to disable protections and shoot yourself in the foot” setting may eventually be implemented. In principle I think it probably should. But there’s always the danger that it lets app developers be lazy and abuse it instead of either doing things in the correct way, or contributing upstream to make it possible. We don’t want to repeat the death of X. It wouldn’t be a win if implementing this causes people to abuse it instead of doing things the new way; we have to tread carefully.

I think the takeaway is that if you want to help make this better in a way that doesn’t ultimately kill the ecosystem in 30 years, get yourself used to the idea of participating in Wayland protocol politics and compositor development. Time marches on, the past will not be the future, etc.

5 Likes

Did you try the desktop effect available on Plasma though? There is an effect with two highlights rotating around the cursor, with one being light and the other dark. It’s great, especially when trying to find the cursor in a light-themed editor.

2 Likes

Yeah, I have, and it works fine. One detail I don’t like about it is the fact it has to be manually triggered with a keyboard, but that’s a minor thing. I’m really fond of Xeyes and I am used to it being around. But that’s beside the point – like I said, this is not only about Xeyes (though I really like that application, people think I’m joking, but I’m dead serious), but about the bigger picture, the UI automation and the freedom to not have your desktop block you when you do things it considers unsafe.

Nathan’s explanation makes a lot of sense, though.

It seems someone already did, as a proof of concept, implement mouse tracking for Weston, just to get Eyes working:

https://blogs.gnome.org/wjjt/2012/03/14/looking-around-wayland/

However, one thing is getting it to work, another is getting it accepted officially.

When we discussed this on another forum, a fellow decided to ask on the official Wayland gitlab about it, as expected, his idea was instantly rejected:
Add a support to get a position of mouse cursor (#383) · Issues · wayland / wayland · GitLab

And yes, such a thing is inherently unsafe (and there should probably be a safety check), but we’ve been dealing with such unsafe issues with X11 for more than 20 years on Linux.

I wonder, how does Kwin implement mouse gestures on Wayland? This must be somehow related to the problem of Eyes.


Oh, by the way…

This was one of my first posts on this forum:

It was stated that it possibly should be implemented as part of the compositor. Wouldn’t that be possible already via KWin scripting or effects? Do effects need to short-lived? Couldn’t xeyes be programmed as an effect?

2 Likes

Joining in the conversation to reply to this part.
At the moment, mouse gestures are not supported on wayland and the customization panel has been removed from the settings. At least, from what I know. :slight_smile:

Edit:
https://bugs.kde.org/show_bug.cgi?id=436627

2 Likes