README
¶
Ojut
Voice dictation using Whisper models.
Features
- 🎙️ Voice-to-text transcription using Whisper models
- ⌨️ Automatic typing of transcribed text into any application
- 🔥 Hotkey-triggered recording (Ctrl+Alt+Cmd+U)
- 🧠 Optional LLM post-processing for:
- Punctuation and capitalization
- Grammar correction
- Speech error cleanup (stutters, filler words, etc.)
- Customizable via system prompts
- 📚 Personal dictionary support for specialized vocabulary
- 🛠️ Configurable via:
- YAML config file
- CLI arguments
- Environment variables
- 🤖 Supports multiple LLM providers (OpenAI-compatible APIs)
- 🎧 Audio feedback for recording start/stop
Usage
Once you have the ojut server running in the background, here is what a sample workflow would look like:
- Focus on the input field you want to type in
- Press the trigger key (currently ctrl+alt+cmd+u)
- Wait for audio cue
- Start speaking
- Release the trigger key
- Text gets typed out into the input field
Configuration
You can specify the whisper model to use. This can be done via either the config file or using CLI args. CLI args will override the value in the config file. We currently only have support to specify the model, but will add more options in the future.
LLM Post-Processing
Ojut can optionally post-process transcribed text using an LLM for better formatting and punctuation. This feature requires:
-
Setting up API credentials:
export OJUT_LLM_API_KEY="your-api-key" # or use OPENAI_API_KEY
-
Enabling in config or CLI:
post_process: true llm_system_prompt: "Cleanup the following transcript and add punctuation. Do not change anything else."
Or via CLI:
ojut --post-process
You can customize the system prompt to modify how the LLM processes text. For example:
To clean up speech mistakes and improve readability:
llm_system_prompt: > You are a transcription assistant. Your task is to: 1. Correct any speech errors, stutters, or mispronunciations 2. Add proper punctuation and capitalization 3. Improve grammar while preserving the original meaning 4. Remove filler words like "um", "uh", etc. 5. Format the text into clear, coherent sentences Do not add any content that wasn't in the original transcript.
For basic punctuation and formatting:
llm_system_prompt: > Clean up the following transcript by adding punctuation and capitalization. Do not change the wording or meaning of the text.
-
Configuration options:
- In config file:
llm_model: "gpt-4o" # Model name llm_base_url: "https://your-llm-endpoint" # API endpoint
- Via CLI:
ojut --llm-model "gpt-4o" --llm-base-url "https://your-llm-endpoint"
- Environment variables:
export OJUT_LLM_ENDPOINT="https://your-llm-endpoint" # defaults to OpenAI (you can use any OpenAI compatible endpoint) export OJUT_LLM_MODEL="gpt-4o" # defaults to gpt-4o-mini
- In config file:
Dictionary
You can specify a personal dictionary in a separate text file. Each line should contain one word or phrase that you want the model to recognize. The dictionary file should be located at ~/.config/ojut/dictionary
.
Example dictionary:
Ojut
Golang
meain
Technical terms
Company names
Note that you have to restart the server for changes to take effect.
Here is what the config file looks like:
model: "medium.en-q8_0" # use "tiny.en-q8_0" if you have a slow machine
Here is how you would specify using the CLI:
ojut -model tiny.en-q8_0
You can also pass in an absolute path to a model file:
ojut -model "/path/to/models/tiny.en.bin"
FYI, the models used in FUTO keyboard seems to be pretty good.
You can specify model as empty ojut -model ""
to show a picker.
Installation
You also could run via the nix flake using
nix run github:meain/ojut
- Install portaudio
- Install whisper-cpp (need to be available in path)
- Install ojut (go install github.com/meain/ojut@latest)
What is with the name?
I'm glad you asked. It's very stupid but here is how I got to the name. "Dictation" in Japanese is "口述筆記" which sounds like "Kōjutsu hikki". Now I just took part of the first word "ojut". It was unique enough and I was tired of looking for the name and so I decided I'm just going to use it.
Documentation
¶
There is no documentation for this package.