If old sci-fi shows are anything to go by, we’re all using our computers wrong. We’re still typing with our fingers, like cave people, instead of talking out loud the way the future was supposed to be. Have you ever seen Picard touch a keyboard? Of course not.
And it’s odd because our computers are all capable of turning speech into text by default. The problem? It just doesn’t work very well. Or, at least, it didn’t. In recent years AI models like Nvidia’s Parakeet and OpenAI’s Whisper, both open source, have made great strides in turning human voices into text. Both excel at correctly adding things like punctuation and capitalization, and you can run them right on your computer. Using these models is the closest I’ve felt to recording a captain’s log—it just works.
The problem? They’re both a little complicated to set up. That’s where Handy comes in. This is a dead-simple, totally free application that can set up either of these models on your computer and give you a keyboard shortcut to use it. It was created by CJ Pais after he broke his finger, rendering him unable to type. He wanted a totally free, and radically simple, way to use existing AI speech-to-text tools.
To get started simply download Handy—there are versions offered for Windows, macOS, and Linux. Run the application and you’ll be asked which model you want to use.
Courtesy of Justin Pot
The default, Parakeet V3, is a great place to start—I didn’t feel the need to try out other models after using it.
It will take a bit for the model to download, but when it does you can start using the application by pressing and holding the keyboard shortcut. By default it’s Control-Space on Windows and Linux or Option-Space on macOS. You’ll see an overlay at the bottom of the screen, letting you know that you’re being recorded and transcribed. Holding the keyboard shortcut you can talk for as long as you like—when you’re done the text will show up in whatever text box is currently active.
I’ve been using this to write the article today, and an impressive thing is I don’t even have to turn my music off—the models are good at filtering it out. I don’t know if that would work in all environments, granted, but my music was pretty loud, and it worked well. I even tried speaking a few sentences in French and Spanish—it worked (and, I imagine, would work even better if my pronunciation weren’t horrible).
Read the full article here

