podscript

command module
v0.0.0-...-4108302 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2025 License: MIT Imports: 36 Imported by: 0

README

podscript

podscript is a tool to generate transcripts for podcasts (and other similar audio files), using LLMs and Speech-to-Text (STT) APIs.

Install

> go install github.com/deepakjois/podscript@latest

> ~/go/bin/podscript --help

Web UI

Podscript has a web based UI for convenience

> podscript web
Starting server on port 8080

This runs a web server on at http://localhost:8080

Demo

For more advanced usage, see the CLI section below.

CLI Getting started

# Configure keys for supported services (OpenAI, Anthropic, Deepgram etc)
# and write them to $HOME/.podscript.toml
podscript configure

# Transcribe a YouTube Video by formatting and cleaning up autogenerated captions
podscript ytt https://www.youtube.com/watch?v=aO1-6X_f74M

# Transcribe audio from a URL using deepgram speech-to-text API
#
# Deepgram and AssemblyAI subcommands support `--from-url` for
# passing audio URLs, and `--from-file` to pass audio files.
podscript deepgram --from-url  https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/

# Transcribe audio from a file using Groq's whisper model
#  Groq only supports audio files.
podscript groq --file huberman.mp3

More Info

Models for ytt subcommand

The ytt subommand uses the gpt-4o model by default. Use --model flag to set a different model. The following are supported:

  • OpenAI
    • gpt-4o
    • gpt-4o-mini
  • Google Gemini
    • gemini-2.0-flash
  • Llama (via Groq)
    • llama-3.3-70b-versatile
    • llama-3.1-8b-instant
  • Anthropic
    • claude-3-5-sonnet-20241022
    • claude-3-5-haiku-20241022
  • Anthropic via Amazon Bedrock
    • anthropic.claude-3-5-sonnet-20241022-v2:0 (via AWS)
    • anthropic.claude-3-5-haiku-20241022-v1:0 (via AWS)
Transcript from audio URLs and files

[!TIP] You can find the audio download link for a podcast on ListenNotes under the More menu

image

podscript supports the following Speech-To-Text (STT) APIs:

  • Deepgram (which as of Jan 2025 provides $200 free signup credit!)
  • Assembly AI (which as of Oct 2024 is free to use within your credit limits and they provide $50 credits free on signup).
  • Groq (which as of Jul 2024 is in beta and free to use within your rate limits).

Development

Want to contribute? Here's how to build and run the project locally:

Prerequisites

Build and run the frontend:

cd web/frontend
npm run dev

Build the backend server and run it in dev mode:

go build -o podscript
./podscript web --dev

This will start the backend server and expose only the API endpoints without bundling the frontend assets

To connect the two:

cd web
caddy run

This should setup everything such that you can visit http://localhost:8080 and have the frontend connected to the backend via the Caddy reverse proxy

Feedback

Feel free to drop me a note on X or Email Me

License

MIT

Documentation

Overview

This file provides a Kong resolver that loads configuration from a TOML file. It is a lightly modified version of the kongtoml package.

It checks if the ytt subcommand is used and if so, it uses the parent path to construct the key. This makes the configuration file more readable and ergonomic.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL