ojut

command module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2025 License: Apache-2.0 Imports: 34 Imported by: 0

README

Ojut

Voice dictation using Whisper models.

Features

  • 🎙️ Voice-to-text transcription using Whisper models
  • ⌨️ Automatic typing of transcribed text into any application
  • 🔥 Hotkey-triggered recording (Ctrl+Alt+Cmd+U)
  • 🧠 Optional LLM post-processing for:
    • Punctuation and capitalization
    • Grammar correction
    • Speech error cleanup (stutters, filler words, etc.)
    • Customizable via system prompts
  • 📚 Personal dictionary support for specialized vocabulary
  • 🛠️ Configurable via:
    • YAML config file
    • CLI arguments
    • Environment variables
  • 🤖 Supports multiple LLM providers (OpenAI-compatible APIs)
  • 🎧 Audio feedback for recording start/stop

Usage

Once you have the ojut server running in the background, here is what a sample workflow would look like:

  • Focus on the input field you want to type in
  • Press the trigger key (currently ctrl+alt+cmd+u)
  • Wait for audio cue
  • Start speaking
  • Release the trigger key
  • Text gets typed out into the input field

Configuration

You can specify the whisper model to use. This can be done via either the config file or using CLI args. CLI args will override the value in the config file. We currently only have support to specify the model, but will add more options in the future.

LLM Post-Processing

Ojut can optionally post-process transcribed text using an LLM for better formatting and punctuation. This feature requires:

  1. Setting up API credentials:

    export OJUT_LLM_API_KEY="your-api-key"  # or use OPENAI_API_KEY
    
  2. Enabling in config or CLI:

    post_process: true
    llm_system_prompt: "Cleanup the following transcript and add punctuation. Do not change anything else."
    

    Or via CLI:

    ojut --post-process
    

    You can customize the system prompt to modify how the LLM processes text. For example:

    To clean up speech mistakes and improve readability:

    llm_system_prompt: >
      You are a transcription assistant. Your task is to:
      1. Correct any speech errors, stutters, or mispronunciations
      2. Add proper punctuation and capitalization
      3. Improve grammar while preserving the original meaning
      4. Remove filler words like "um", "uh", etc.
      5. Format the text into clear, coherent sentences
      Do not add any content that wasn't in the original transcript.
    

    For basic punctuation and formatting:

    llm_system_prompt: >
      Clean up the following transcript by adding punctuation and capitalization.
      Do not change the wording or meaning of the text.
    
  3. Configuration options:

    • In config file:
      llm_model: "gpt-4o"  # Model name
      llm_base_url: "https://your-llm-endpoint"  # API endpoint
      
    • Via CLI:
      ojut --llm-model "gpt-4o" --llm-base-url "https://your-llm-endpoint"
      
    • Environment variables:
    export OJUT_LLM_ENDPOINT="https://your-llm-endpoint"  # defaults to OpenAI (you can use any OpenAI compatible endpoint)
    export OJUT_LLM_MODEL="gpt-4o"  # defaults to gpt-4o-mini
    
Dictionary

You can specify a personal dictionary in a separate text file. Each line should contain one word or phrase that you want the model to recognize. The dictionary file should be located at ~/.config/ojut/dictionary.

Example dictionary:

Ojut
Golang
meain
Technical terms
Company names

Note that you have to restart the server for changes to take effect.

Here is what the config file looks like:

model: "medium.en-q8_0" # use "tiny.en-q8_0" if you have a slow machine

Here is how you would specify using the CLI:

ojut -model tiny.en-q8_0

You can also pass in an absolute path to a model file:

ojut -model "/path/to/models/tiny.en.bin"

FYI, the models used in FUTO keyboard seems to be pretty good.

You can specify model as empty ojut -model "" to show a picker.

Installation

You also could run via the nix flake using nix run github:meain/ojut

  • Install portaudio
  • Install whisper-cpp (need to be available in path)
  • Install ojut (go install github.com/meain/ojut@latest)

What is with the name?

I'm glad you asked. It's very stupid but here is how I got to the name. "Dictation" in Japanese is "口述筆記" which sounds like "Kōjutsu hikki". Now I just took part of the first word "ojut". It was unique enough and I was tired of looking for the name and so I decided I'm just going to use it.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL