Recently I’ve been developing sight-free-talon, a program for non-visual voice interaction. To my knowledge, my repo is the only actively maintained and cross platform solution in the world for this task. All code for this project can be found on the sight-free-talon github repo

Using low vision tools like screen readers alongside voice commands or dictation is deceptively hard:

  • Sophisticated voice input tools like rango or cursorless are fantastic for the sighted, but they are very dependent on visual markers.
  • Screen readers like NVDA are dependent on keyboard input to move focus.
  • General challenges with voice input like misrecognitions are much more tedious to correct non-visually.

The more development I have done, the more strongly I feel like general purpose voice control will drastically reduce friction within non-visual computer use, and do more than just help those with RSI. It is well positioned to significantly reduce challenges skimming large amounts of prose/code and improve the navigation speed and general experience within complicated interfaces.

  • By using platform accessibility APIs, I support commands for saying the name of the element and allowing the user to directly jump to or click on it. So no need to repeatedly use tab or arrow keys to move the screen reader / system focus.
  • With language model support from my other project, talon-ai-tools we can quickly summarize or inquire about the context of a textual selection, describe images or page layouts, and control our computers with natural language through the OpenAI function calling API. This could all be done manually, but voice commands drastically reduce the friction within LLM interaction. No copying/pasting needed.

My program is an extension over Talon Voice and a variety of screen readers, with NVDA currently being the best supported. Talon is challenging for new users, and the fact is community driven can sometimes make standardization difficult. Yet nonetheless, it is by far the most customizable and robust option for voice controlled computer use. As Talon’s UX continues to improve with better installation and package management, I can envision it becoming a standard tool for use alongside screen readers.

Regardless of how this progresses, I love HCI projects because they have embodied real world benefits for people. There is so much potential for innovation simply by reconsidering our qualitative assumptions and recontextualizing existing technologies. It has been a real pleasure interacting with individuals I now consider friends from the NVDA, Orca, and Talon communities.