Questions about UI automation on KWin Wayland

ngraham · May 31, 2023, 8:13pm

Oh, I wouldn’t even know where to start. I have absolutely no experience with Wayland.

Well, maybe that’s part of the problem. Wayland has been around for 10 years now. Might be time to learn something new. If I can do it, so can you!

the user should have the freedom to bypass security features at their own risk.

Yeah, you already see that on other platforms: you launch an app and it shows you a dialog saying, “Please follow these instructions to abuse the accessibility API and grant my app wide permissions so it can behave normally.” It’s quite a terrible UX that annoys and confuses users and amounts to undoing the work done to make the system secure.

What’s more likely–and also more useful–is to allow apps to opt into specific bits of elevated functionality that could potentially be dangerous: “snoop keyboard activity”, “snoop mouse position”, “use webcam” and so on. These requests would be facilitated by the compositor and the portal system to show the user an approve/deny dialog, remember settings per app, and also change the security settings of apps after the fact, if needed. Basically what you see in iOS and Android. And we already have it for tons of things like screenshots, screen recording, global shortcuts, etc.

But for a thing that’s not currently supported, it requires the hard work of proposing a protocol, shepherding it through the protocol review process, and them implementing the needed support in your favorite compositor and portal implementation. It’s all possible, and things like this are happening constantly in the background, but it does take time and expertise.

So I totally get that if you’re an app developer, your scope of care is your app; you want the system to facilitate what you want it to do. You don’t want to become a Wayland developer and wait 5 years just to be able to position popups or snoop the keyboard or whatever. And you think “of course the user trusts me and my app; why else would they be using it?” But this attitude is what got us into the mess of X11 where any change to the X server broke scores of important apps and basically killed X development. That’s the hidden story of why Wayland exists. X11 is at a point where its poor architecture and library of apps doing creative things basically prevents development and holds back the entire platform. So that’s why X11 doesn’t have per-screen scaling, variable refresh rate, HDR, and so on.

Obviously, that’s not ideal either.

So the Wayland transition involves as much of a mindset shift as it does a technical shift. App developers need to start talking more to their upstreams when they encounter something they think they can’t do, rather than finding a creative app-specific hack or shipping an unmaintained patched fork of half the system libraries.

Ultimately something like a “click here to disable protections and shoot yourself in the foot” setting may eventually be implemented. In principle I think it probably should. But there’s always the danger that it lets app developers be lazy and abuse it instead of either doing things in the correct way, or contributing upstream to make it possible. We don’t want to repeat the death of X. It wouldn’t be a win if implementing this causes people to abuse it instead of doing things the new way; we have to tread carefully.

I think the takeaway is that if you want to help make this better in a way that doesn’t ultimately kill the ecosystem in 30 years, get yourself used to the idea of participating in Wayland protocol politics and compositor development. Time marches on, the past will not be the future, etc.