My Accessibility Stack and the future on Wayland

Hello there, I’m not too familiar with the innards of KDE and where to post this-- Here’s hoping I’m in the right place. I’m also dropping this on the GNOME Discourse at the same time. Just trying to get some eyes on it. It’s about the situation of accessibility on Wayland, and how soon I’ll no longer be able to use my favorite desktop environment once X11 support has been removed.

Let me know if there’s a particularly good person to get eyes on this article. @ngraham I’m pinging you in particular, as I’ve been watching your community engagement work from the sidelines for years now and you seem like you’ll definitely know the right person to get in touch with. (Not to mention that I quote you in particular…)

My Accessibility Stack and the future on Wayland – Insane Rambles About Technology

6 Likes

Thanks for writing that. I can feel your apprehension and disappointment, and those are 101% valid emotions to feel over this situation.

I can’t make any promises, but the information in this post is on my radar screen. I’ll see what options are available.

1 Like

I think it’s definitely on our TODO list to make sure apps with “Adversarial accessibility” (I like that phrase) can exist.

Whether Tanon specifcially gets Wayland support isn’t something that we can really be in control of, especially given the maintainer comments. I get the frustration, everyone is frustrated with everyone. Nor can we be in control of getting to a point where things work on all desktops.

But we can see what we can do.

I installed Talon and did some quick inspection. The API docs are lacking, we can reverse engineer a bit with Scripting → REPL which gives an interactive console, but it’s probably easier to see what scripts actually use by grepping the community scripts.

grepping function calls in Talon community scripts led to the following:

  • talon.ui.active_app

  • talon.ui.active_window

  • talon.ui.active_workspace

  • talon.ui.apps

  • talon.ui.launch

  • talon.ui.main_screen

  • talon.ui.register

  • talon.ui.screen_containing

  • talon.ui.screens

  • talon.ui.switch_workspace

  • talon.ui.windows

All of these are basically just the task manager protocol.

Granted this isn’t very standardised. We (KDE) could pivot to the wl_roots one with our stuff extended on top. It’s a lot of boring churn to get to the same result, but if it helps fix all these it might be worth it.

  • talon.actions.key
  • talon.actions.insert
  • talon.actions.user.formatted_text
  • talon.actions.sleep
  • talon.actions.user.emacs
  • talon.actions.user.idea
  • talon.actions.user.vscode
  • talon.actions.edit.left
  • talon.actions.edit.selected_text
  • talon.actions.user.paste probably just yolos control+v

With 95% of calls being just action.key

This one has docs! It’s super trivial: key() action | Talon Community Wiki
It’s relatively straightforward, sends a keysym to the active app. It doesn’t seem to support sending to artbitrary apps which makes life easy.

Within the Wayland ecosystem that means any of:

  • portal + libei
  • libei directly
  • wlr_virtual_keyboard / kde_fake_input
  • or being an input method…

The rest seem to be clipboard

Clipboard has two options:

  • the portal approach
  • wlr_data_control

The key question to answering what we do and what these apps should do is whether we care about shipping talon inside a sandbox (we probably don’t) and whether we (KDE) care about having things available to non-sandboxed apps.
We seem to have generally pivoted to agreeing it’s not that problematic, at which point we should make an effort to standardise some of these things. That absolutely won’t fly with Gnome, but it’s also a chance for us to be ahead on some practical tasks.


In terms of actionable steps, I’m not sure what the next steps are.

We could write some simple python that demonstrates doing all these tasks is doable. It probably could even be trampolined on top of upstream Talon and “just work”.

5 Likes

I have lots of questions as to how this works.

Most importantly, lets say I have a trigger that invokes an action in a given app (lets say send space to spotify to pause on a voice keyword) and Spotify is on another desktop, does this just switch desktop back replay a key event then switch back?

If you have a similar setup (On X), can you run:

xprop -root -spy

then trigger this, I would love to see the output.

Very good write up!

It is sad that so many app developers are still very hostile towards Linux.

Demanding that their Linux users essentially doing all the Linux support themselves.

Obviously the Talon developer is especially nasty, moving existing support to paid tiers in order to increase the pressure for free labor or forced payment to just keep the existing support running.

Let alone forcing those pressured enough to be willing to do the work themselves, to work blind, implementing things based on guessing and reverse engineering (like @David_Edmundson just did :heart:)

Ultimately the people who use such tools might be better off if some of them can come together to implement a replacement, out of the reach of this sad Linux hostility.

Many of windowing things are already possible with kdotool for KDE. And a recent wdotool project enables key input for GNOME, KDE and wlroots. They can be stop-gap solutions before a proper, standard protocol is available.

I guess needing to implement things separately for at least GNOME, KDE and wlroots is one of the reasons the dev pulled Linux support.

there is also xdotool and ydotool

now someone just needs to come up with zdotool that works on everything everywhere.

so many tools and not enough letters in the alphabet.

As someone trying to make a cross platform game work identically on different platforms, I am unsurprised at the Talon author’s unwillingness to invest huge amounts of time trying to figure out how to make it work consistently on different Linux desktop environments now that Wayland has made that task painful and tedious. My hope is that stories like yours encourage much greater willingness from Wayland developers to engage with app developer concerns and make suitable and practical compromises and changes.

That is exactly what is needed.

However, I don’t understand why you were mocking those who developed all these non-perfect tools.

i’m giving a shout out to their naming ritual.

Oh I have experience in this area to share - for context, I’m the xdotool maintainer, software used by lots of accessibility and utility tools. Most times I see this kind of paste function, it’s really an expression of “I, the user, wish to insert the the following text into the currently focused text field” – and the clipboard is simply the lowest friction way to achieve it.

There are two ways that I know of that applications achieve this, and I do not believe either of them are the best approach:

  • They send keystrokes using some virtual keyboarding interface (XTest, XDG RemoteDesktop, libei,
  • Or, they insert text into the clipboard, then usually “yolo ctrl+v”, as you suggest. Some apps (KeePass) are aware that ctrl+v is not always the “paste” action and try to use something else (shift+insert, ctrl+shift+v, etc) depending on the focused application, which requires knowing the application that has focus.

I think a better approach would be a way for these programs to express “insert the following text into the current application’s focused text field”. We already have this concept in input method systems (IBus, fcitx, etc). The IME has a “commit text” concept which achieves exactly what most of these applications are after.

Maybe this could be something to work towards?

4 Likes

I’m working on porting xdotool to “wayland” (which means targeting each compositor separately, I’m afraid). I’ve got a good amount of research done to see if it’ll work and I’m moving forward :slight_smile:

Right now, I’ve got virtual input working on kde, gnome, and wlroots (all different APIs) and hope to have something to release soon.

5 Likes

Cool, talk to us about long-term stuff as well as doing short-term workarounds.

It’s super dumb that we have this state of “it shouldn’t be standardised as it’s a niche” then everyone exposing their own thing anyway.

I’ve got a meeting in a few weeks about standardising a way to let every (non-sandboxed) apps start videocasting and doing fake input, so the tide is definitely turning on all of this.

5 Likes

Hi everyone, in between all the other places I posted this and the Matrix conversations I totally missed this particular thread.

To krake’s point: I really, really want to dissuade this kind of thought process. I’d rather look at it from the developer’s point of view, which mcarans pointed out-- it’s more like the Wayland ecosystem is demanding that application developers do their platform development for them (and then have their needs/concerns discarded anyway…) And this kind of speech is exactly the kind of thing that burns people out.

I would much rather be paying Aegis for his time than hope to replicate his many years of intensive R&D into speech-based computing, rolled up into a product.

David already got the rough Wayland support list from me but I’m reposting it here: wayland-accessibility-notes/talon-requirements.md at main · splondike/wayland-accessibility-notes · GitHub

Also to David, regarding your multi-desktop comment, I don’t think(?) the example you’re looking for exists. But instead I put Dolphin on Desktop 2, Firefox on Desktop 1, then ran your monitoring tool while I focused Dolphin then focused firefox. Here’s what that looks like:

_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x5600b80, 0x860000f, 0x5a0000f, 0x160298f, 0x160445e
_NET_CURRENT_DESKTOP(CARDINAL) = 1
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x0
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x5600b80, 0x860000f, 0x5a0000f, 0x160298f, 0x160445e
_QT_GET_TIMESTAMP(INTEGER) = 
_QT_GET_TIMESTAMP(INTEGER) = 
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x5a0000f
_NET_CLIENT_LIST(WINDOW): window id # 0x5600017, 0x5600108, 0x560011a, 0x560012c, 0x5c00034, 0x5e00004, 0x680000a, 0x7400004, 0x740000b, 0x740011a, 0x5600b80, 0x900000e, 0x5601449, 0xa40003d, 0x1602912, 0x160298f, 0x740026a, 0xb60000f, 0xc00000f, 0x160445e, 0x16044da, 0x5a0000f, 0x860000f
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x5600b80, 0x860000f, 0x5a0000f, 0x160298f, 0x160445e
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x860000f, 0x5a0000f, 0x5600b80, 0x160298f, 0x160445e
_NET_CURRENT_DESKTOP(CARDINAL) = 0
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x0
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x860000f, 0x5a0000f, 0x5600b80, 0x160298f, 0x160445e
_QT_GET_TIMESTAMP(INTEGER) = 
_QT_GET_TIMESTAMP(INTEGER) = 
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x5600b80
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x860000f, 0x5a0000f, 0x5600b80, 0x160298f, 0x160445e
_NET_CLIENT_LIST(WINDOW): window id # 0x5600017, 0x5600108, 0x560011a, 0x560012c, 0x5c00034, 0x5e00004, 0x680000a, 0x7400004, 0x740000b, 0x740011a, 0x5600b80, 0x900000e, 0x5601449, 0xa40003d, 0x1602912, 0x160298f, 0x740026a, 0xb60000f, 0xc00000f, 0x160445e, 0x16044da, 0x5a0000f, 0x860000f
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x860000f, 0x5a0000f, 0x5600b80, 0x160298f, 0x160445e
_QT_GET_TIMESTAMP(INTEGER) = 
_QT_GET_TIMESTAMP(INTEGER) = 
_QT_GET_TIMESTAMP(INTEGER) = 
_NET_CLIENT_LIST_STACKING(WINDOW): window id # 0x1602912, 0x16044da, 0xa40003d, 0x7400004, 0xb60000f, 0xc00000f, 0x900000e, 0x5e00004, 0x680000a, 0x5600108, 0x5600017, 0x560011a, 0x5c00034, 0x560012c, 0x5601449, 0x740000b, 0x740026a, 0x740011a, 0x5a0000f, 0x5600b80, 0x860000f, 0x160298f, 0x160445e
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x860000f
_NET_ACTIVE_WINDOW(WINDOW): window id # 0x860000f
_QT_GET_TIMESTAMP(INTEGER) = 
^C

Does that help?

Also, off-topic, but this account is fairly old and I’d like to get the username updated to “nocoffei” to match everywhere else. Who should I talk to to make that happen? It doesn’t seem that I can do it myself.

1 Like

This is a common misconception which sadly gets perpetuated a lot.

The Wayland stack has a lot of stakeholders (driver, compositor, frameworks/toolkits, apps/desktop), each usually working on their respective part but also collaborating on things they share (e.g. driver interfaces, protocols).

Many even contribute to multiple parts, e.g. a compositor developer providing patches for a toolkit or vice versa.
For example this protocol request is the work of an app/desktop developer who additionally contributed the implementation for one compositor and two frameworks.

In stark contrast to the alleged stand point of the Talon Voice developer who will not even talk to people, the very basic requirement for collaboration.

And then using this as an excuse to pull a veritable drug dealer move by taking away what was previously freely available and making it “pay or else I’ll remove it altogether”.
Charming fellow.

Luckily it appears that there are some people such as yourself, David and Jordan, who are willing to work around such petty closed mindedness.

Done

Heh, well… how to put this to words. I suspect you are putting me in a group in which I do not belong? This requires some background, and I apologize in advance for the length, but such background sharing is necessary to give context to where my head is at regarding accessibility on wayland/kde/gnome/etc.

I’ve been maintaining xdotool for ~18 years. This tool is used by numerous accessibility and utility tools (keepass, etc) as a way to perform tasks that are necessary and useful, but are not easy to program, on X11. This tool allows programmatic access to otherwise-human-only activities such as virtual input and window+desktop management. Think of xdotool mainly as a way to instruct a computer to perform an action that typically requires physically striking keys or interacting with a physical mouse (typing, dragging windows, etc). On X11, this meant basically two things available in X11: XTest (for virtual input) and EWMH (for window management API).

When Wayland was first shipped on Fedora 10 years ago, I noticed two things:

  1. Every capability offered by xdotool was impossible because Wayland simply shipped without any way to do these things.
  2. And, that every person I talked to, blog post I read, news article, etc, praised Wayland for omitting these capabilities and congratulated the project’s attention to security.

The lack of simulated input was praised. The lack of ability for windows to request positions - praised. The inability to determine where the mouse cursor was - praised. Etc. All of these were things folks used xdotool for – controlling window layouts, scripting multi-step tasks, etc because running one script was easier than a manual 20-step task requiring a degree of dexterity they no longer possessed.

Meanwhile for Wayland’s launch, at-spi2 and Orca lost capabilities, input engines have problems, xdotool was not a possible thing, etc. Outside of accessibility, we lost screenshot/sharing, remote desktop, drag-and-drop between apps, etc. This project shipped without these things and received praise for it!

(end background)

I am very enthused that this thread shows alignment towards a better future, but I am still quite worn out by discussing the topic, I and still remain with only limited hope that it will improve - 10+ years is a long time to collect disappointments, ya know? :slight_smile:

The transition to Wayland created a fragmentation problem - now every compositor has to agree (or not!) to every thing… and I don’t even know who to ask! There’s no leaders, no central group that I knew of, and on my way to figure things out, I find that everything is rejected. It’s disappointing.

If you get a particular stroke of bad luck, you actually visit the right places, but meet the wrong person…

Just a few days ago, someone on the KWin matrix channel asked about the ability to position windows programmatically and immediately it was shutdown by someone else who insisted that this capability was insecure and unnecessary… as if it was a Wrong Idea.

Talon Voice developer who will not even talk to people

Please accept that possibility that the Talon developer, like me, has tried talking to people, but the people they met insisted that their use case, and their API needs, were impossible, insecure, and wrong.

For my part in all of this, I am happy to share what I’ve seen people do on X11 - with xdotool and others - that are currently not possible, or extremely difficult, on Wayland.

I think the Talon developer is, in many ways, extremely correct in their choice. A business decision to target “Wayland” is difficult because “Wayland” is at least 3 different platforms (GNOME, KDE, wlroots) and is likely to grow more.

I will continue working towards xdotool for Wayland, and none of my current planned work requires any changes from KDE - I use all the APIs I can find: xdg portal, wayland, dbus+kwin javascript, etc. Once released, I hope for users it will serve as a bridge for the tools abandoned in the transition to Wayland and to restore capabilities lost. I hope for compositor developers, like KDE, that it will serve as evidence that the “impossible! insecure!” rejections are, actually, already possible, and that we would all be better served if these capabilities, or even the use cases, were well supported.

4 Likes

Sorry, you seemed to be saying that you are working on improving to the situation.

Well, these need to be seen in their context.

Simulating user input, for example, is essentially impersonating the user.
This should not be possible for any random program at any given time without the user’s explicit consent.

A remote access application, for example, which has asked for that privilege can do so through the respective API.

An applications primary window show open either where the user closed it (restoring a saved situation) or where the user wants to according to their preferences (close to cursor, current monitor vs primary monitor, centered vs least overlapping, etc).

The compositor knows all that while the applications typically do not.

For secondary windows the application can request almost any position on or around their primary window.

Applications get mouse cursor position updates whenever the mouse is on any of their windows, nicely mapped to the coordinate system of that window.

Well, accessibility seems to still have some gaps, the other things have been working for quite a while.

As did window managers on X11, e.g. EWMH

My understanding is that all shared protocol work and base library implementations is done on freedesktop.org just like X11 development before and ongoing.

The article at the top of this thread suggested that they have indicate unwillingness to do so.
Since that sounded awful I wrote “allegedly”.

The same article claimed that they pulled a “bait & switch” which again might not actually be true.

Well, at least to me this suggest that you are in the group of people who want to improve things, even if you say you are not.

I think the quote was about the absolute position, not dependent on any specific window. For example, to simulate mouse movement or clicking for automation or accessibility (when controlling it via voice or eye movement), or for displaying something near the mouse cursor. For example, I was working on a simple Talon helper app that would display something as an indication that some command was triggered when it does not have any obvious immediate result.

The article did not call it bait & switch. I think the idea that it is a bait & switch does not make any sense. If he wanted more money, there are much more effective ways than doing something like that just for Linux users.

I would expect this to be the same as a remote desktop app controlling the mouse from the “other side”.

Yes, that might currently not be possible.
The question is does this need the cursor position or an API to show a surface near the cursor position. Like a tooltip.

Yeah, it is probably even worse.

Tricking people into thinking that something was free when it was actually just free for a hidden trial period.
Getting people into investing time to learn and make use of the demo and then, when they have become to rely on it, require them to pay up.

Maybe we should call it a protection racket instead?

I bet the Linux users were just a good test group.
If they put up with this and “pay up”, users on other platforms will be next.

It is sadly how a lot of “entrepreneurs” react to getting near monopoly status in a given field.

The only long term viable solution is investment into a FOSS alternative as that can always be forked and continued by others if necessary.

It is not a service that can be taken away. It is a fully offline app.
And I would say it is feature complete already. Talon is more like a framework, it does not ship any ready to use setups etc.
So even if its development completely stops, the only issue would be the support for future platforms, such as Wayland.

But the article claims that the plan was to stop releasing the Linux version and only keep the X11 support for the paid tier (I am not sure what is the source of this claim, so far nothing has changed). So it would not help those who started to rely on it and want to continue using it on Wayland, it’s not like offering Wayland support for any payment.
Don’t you think the reason for such decision has nothing to do with “entrepreneurship”?
To me it makes sense, if most of the major distros migrated to Wayland, and there is no reasonable way to add support for that, then keeping only X11 version available for everyone just results in lots of frustrated people coming to complain.
And those who remain on X11 can simply continue using the last available version, since it’s unlikely to require any changes to continue supporting X11. Or even if they need some latest features, they can just subscribe to Patreon for a month and download it.