chew

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 27, 2024 License: MIT Imports: 11 Imported by: 0

README

chew logo

Go Report Card GoDoc Maintainability codecov License

A Go library for processing various content types into markdown/plaintext..

About

Chew is a Go library that processes various content types into markdown or plaintext. It supports multiple content types, including HTML, PDF, CSV, JSON, YAML, DOCX, PPTX, Markdown, Plaintext, MP3, FLAC, and WAVE.

Installation

go get github.com/mmatongo/chew

Usage

Here's a basic example of how to use Chew:

package main

import (
    "context"
    "fmt"
    "log"
	"time"

    "github.com/mmatongo/chew"
)

func main() {
    urls := []string{
        "https://example.com",
    }

	// The context is optional
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

    chunks, err := chew.Process(urls, ctx)
    if err != nil {
		if err == context.DeadlineExceeded {
			log.Println("Operation timed out")
		} else {
			log.Printf("Error processing URLs: %v", err)
		}
		return
    }

    for _, chunk := range chunks {
        fmt.Printf("Source: %s\nContent: %s\n\n", chunk.Source, chunk.Content)
    }
}

Output

Source: https://example.com
Content: Example Domain

Source: https://example.com
Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

Source: https://example.com
Content: More information...

You can find more examples in the examples directory as well as instructions on how to use Chew with Ruby and Python.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have any suggestions or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

The logo was made by the amazing MariaLetta.

Similar Projects

docconv

Roadmap

The roadmap for this project is available here. It's meant more as a guide than a strict plan because I only work on this project in my free time.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Transcribe = transcribe.Transcribe

Transcribe uses the Google Cloud Speech-to-Text API to transcribe an audio file. It takes a context, the filename of the audio file to transcribe, and a TranscribeOptions struct which contains the Google Cloud credentials, the GCS bucket to upload the audio file to, and the language code to use for transcription. It returns the transcript of the audio file as a string and an error if the transcription fails.

Functions

func Process

func Process(urls []string, ctxs ...context.Context) ([]common.Chunk, error)

Process takes a list of URLs and returns a list of Chunks

The slice of strings to be processed can be URLs or file paths The context is optional and can be used to cancel the processing of the URLs after a certain amount of time

Types

type TranscribeOptions

type TranscribeOptions = transcribe.TranscribeOptions

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL