Demo Video and General Introduction
Installing
-
Clone the repo into your Talon user directory.
-
Clone the Talon community repository for general Talon commands
- This is the sole OS-agnostic dependency of this project.
You should frequently run git pull to update the scripts.
OS-Specific Dependencies
-
Linux
- Install
spd-say
to play standard tts. - Install
piper
to use theomnx
model for more natural speech- run
pipx install piper
to install it (thuspipx
is a dependency)
- run
- Install
-
Windows
- Using NVDA:
- Install the NVDA addon file from the repo releases page
- If you do not want to install the addon, disable
Speech interrupt for typed characters
in NVDA settings to make sure typing text from Talon is not interrupted with every typed character.
- Using NVDA:
-
Mac
- No extra dependencies
Talon Installing
You should have already installed Talon and the community repository. If not, see the Talon docs for instructions. In order to best use this repository you should be familiar with the basics of Talon and basic commands from the community repository.
This document is not a replacement for the wiki but is intended to be used as a quick way to briefly get an overview of the most important commands and most relevant behavior.
Talon Brief Overview
Talon is a voice control engine. In order to have any behavior, you need to install scripts, the standard of which comes from the community repository. Each time you say a command within Talon, the voice model will try to match the command to the closest one that is defined and is contextually in scope. So, for instance, if there are specific commands to control Gmail, but you are not within Gmail, the command will likely be misrecognized. For this same reason, it is also very important to be in the proper mode. By default, there are two main modes: dictation mode and command mode. The former is for dictating raw text and the latter is for calling specific commands. If you are not in the proper mode but try to say a command that is defined in a different mode, Talon will likely still try to interpret the phrase, but it will be matched to something in the wrong mode.
Debugging Talon Issues
If you are not getting the proper behavior within Talon, most of the time it is likely due to a poor microphone or an error in your scripts. You do not need a fancy microphone to have good performance with Talon; however, too much background noise, static, or fans are likely to cause issues. You should check the "save recordings" option within the Talon tray icon menu if you are getting poor recognition. This will allow you to hear what Talon hears for a given phrase.
Helpful Standard Talon Commands
Command | Description |
---|---|
command mode | Switches Talon into command mode, where your words are interpreted as commands |
dictation mode | Switches Talon into dictation mode, where your words are interpreted as raw text |
launch | Launches the specified application |
focus | Focuses the specified application |
talon wake | Wakes Talon up if it is asleep |
talon sleep | Puts Talon to sleep |
press | Presses the specified key |
air bat cap drum each fine gust harp sit jury crunch look made near odd pit quench red sun trap urge vest whale plex yank zip | The Talon phonetic alphabet |
sentence | Dictate a sentence with the first word capitalized |
title | Dictate a sentence with all words capitalized |
word | Dictate a single word |
scratch that | Undoes the last thing you said |
wipe | Presses backspace |
Commands Specific to Sight-Free Talon
Sight-Free-Talon has a series of voice commands and settings to make Talon easier to user alongside screen readers. Any general commands for dictating text or controlling your computer can be found in the central community repo, which you should also have installed.
Commands
TODO
Settings
All settings can be set within .talon
files and contextually scoped to specific applications.
Setting | Description | Default Value |
---|---|---|
user.echo_dictation | Echo the subtitles from talon back via tts | true |
user.tts_speed | How fast to play back text-to-speech -10 to 10 | 8 |
user.tts_volume | How loud to play back text-to-speech from 0 to 100 | 80 |
user.echo_context | Automatically echo the context of the focused window when switching applications/tabs | false |
user.tts_via_screenreader | If a screen reader is enabled, use it for tts instead of the TTS engine in Talon | true |
user.nvda_key | Key used for nvda modifier, change to 'insert' if that is your nvda modifier | 'capslock' |
user.start_screenreader_on_startup | Start your screen reader automatically when Talon starts | false |
user.braille_output | Output dictated text to braille display through your screen reader | false |
user.sound_on_keypress | To prevent errors from accidental key presses, play a sound each time a key is pressed | false |
user.disable_keypresses | Disable keypresses from Talon in high risk contexts that cannot afford typos | false |
Contributing
My goal is to make contributing as easy as possible. Please directly reach out to me if you are interested in contributing to this repository. I am happy to help you get started and answer any questions you may have.
I can be reached either through the Talon Slack or my website, https://colton.place.
Technical Contributions
The project repository is structured such that every screen reader or unique feature gets its own folder. Each folder contains a .talon
file with the commands for that screen reader or feature. Any features related to global scope or settings are in the root settings .talon
file. If you would like to add support for a new screen reader I encourage you to follow the format of the other screen readers and implement similar function overrides. All baseline declarations that are contextually overriden are in the core
folder.
Testing with Your Own Setup
I do not have the resources to test certain combinations of screen readers and operating systems. If you would like to contribute to this repository, I encourage you to test the commands on your own setup and provide feedback. If you are not familiar with GitHub, you can directly get in contact with me.
To check for errors, you can send me a copy of your Talon log.
Non-Technical Contribution
I greatly benefit from general qualitative design feedback and learning more about the particular workflows of users. My intention is for this repository to be useful for people of all abilities and technical skill levels, so I am very interested in hearing about any difficulties you may have with the repository or Talon in general.
If you are a user with a vision impairment, I am curious to hear how you have interacted with voice dictation software in the past. I could also use qualitative feedback regarding things like alternative computer feedback mechanisms, such as braille, haptic feedback, or pitch-based audio feedback. I am curious to explore different ways of providing information to the user, and am excited about exploring more experimental ideas.
Philosophy
In this repository, we want to create a solution that is low friction and feels natural and unintrusive for all users. For users with eyestrain or a vision impairment, we want our repository not to feel like a begrudgingly accepted accommodation, but an exciting improvement that unlocks new forms of human computer interaction. This is similar to what Cursorless has accomplished with voice programming. It is not simply a way to code when you do not have access to your hands, but rather it is a full new way to think about coding, one that is often more efficient to begin with. By using Talon's scripting potential and various community tools and AI integrations, we have the potential to realize this within low vision tools as well. After all, the most accessible tasks are the ones which can be automated away, and don't need to be done in the first place.
As such, repository is designed around a series of core principles:
- Keep as much behavior directly in Talon as possible and make as few screen-reader specific changes as possible.
- This makes development easier and more maintainable.
- Make our Talon code well integrated with the rest of the Talon community.
- This means using the same conventions and style as the rest of the community and not dynamically loading specialized libraries or doing low level hacks if it can be avoided.
- Create a solution that can be used for people of all abilities
- This means that we want to make sure that the solution is usable for people who are blind, low vision, or sighted.
- Make sure that the solution is usable for people who are new to Talon and people who are experienced.
- On install, the solution should work out of the box with minimal configuration.
- Settings should not change other parts of the user's Talon configuration.
- All settings should be located in a central settings file.
- Application specific voice commands should be located in their own specific file and contextually scoped
- Feature creep is bad and hurts the long term maintainability of the project.
- If a feature is not used by a large number of people, it should be removed.
- Focus on a few popular screen readers and make sure they work well.
Sources & Inspirations
Design Patterns
- https://www.afb.org/aw/19/4/15104
- https://github.com/dictationbridge
- http://www.hartgen.org/j-say
- http://www.eklhad.net/philosophy.html
- https://github.com/EmpowermentZone/EdSharp
- https://github.com/aPinix/indent-jump-vscode
- https://www.techrxiv.org/articles/preprint/Image_Captioning_for_the_Visually_Impaired_and_Blind_A_Recipe_for_Low-Resource_Languages/22133894
- https://www.theverge.com/23203911/screen-readers-history-blind-henter-curran-teh-nvda
- https://www.freedomscientific.com/SurfsUp/Speech_Sounds_Schemes.htm
Emacspeak
- https://tvraman.github.io/emacspeak/manual/
- https://emacspeak.blogspot.com/
- https://www.emacswiki.org/emacs/BrailleMode
- https://www.emacswiki.org/emacs/EmacSpeak
- https://emacspeak.sourceforge.net/tips.html
- https://tvraman.github.io/emacspeak/manual/TTS-Servers.html
NVDA
- NVDA Controller client
.dll
file can be found at: https://www.nvaccess.org/files/nvda/releases/stable/ - Documentation for this controller client can be found at https://github.com/nvaccess/nvda/blob/master/extras/controllerClient/readme.md
- https://github.com/nvaccess/nvda/wiki/
- https://github.com/nvda-es/devguides_translation/blob/master/original_docs/NVDA-Add-on-Development-Guide.md
- https://addons.nvda-project.org/addons/AudioThemes.en.html
- https://github.com/nvdaaddons/DevGuide/wiki/NVDA-Add-on-Development-Guide
- https://addons.nvda-project.org/addons/phoneticPunctuation.en.html
- Helpful settings
- Document Formatting: "Line Indentation Reporting" set to "Tones"
- Scratchpad located within the NVDA advanced settings, there you can import python scripts during developmentBlunder
- Disable insert as a modifier key in order to work with Rango
- NVDA Addon Build Process
- Copyright (C) 2012-2023 NVDA Add-on team contributors. This package is distributed under the terms of the GNU General Public License, version 2 or later. Please see the file COPYING.txt for further details. alekssamos added automatic package of add-ons through Github Actions. For details about Github Actions please see the Workflow syntax for GitHub Actions. Copyright (C) 2022 alekssamos
JAWS
Libraries
- https://github.com/accessibleapps/accessible_output2
- https://github.com/qtnc/UniversalSpeech
- https://rhasspy.github.io/piper-samples/
- more voices