A Go library for processing various content types into markdown/plaintext..
About
Chew is a Go library that processes various content types into markdown or plaintext. It supports multiple content types, including HTML, PDF, CSV, JSON, YAML, DOCX, PPTX, Markdown, Plaintext, MP3, FLAC, and WAVE.
Installation
go get github.com/mmatongo/chew
Usage
Here's a basic example of how to use Chew:
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/mmatongo/chew"
)
func main() {
urls := []string{
"https://example.com",
}
// The context is optional
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
chunks, err := chew.Process(urls, ctx)
if err != nil {
if err == context.DeadlineExceeded {
log.Println("Operation timed out")
} else {
log.Printf("Error processing URLs: %v", err)
}
return
}
for _, chunk := range chunks {
fmt.Printf("Source: %s\nContent: %s\n\n", chunk.Source, chunk.Content)
}
}
Output
Source: https://example.com
Content: Example Domain
Source: https://example.com
Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
Source: https://example.com
Content: More information...
You can find more examples in the examples directory as well as instructions on how to use Chew with Ruby and Python.
Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request if you have any suggestions or improvements.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Logo
The logo was made by the amazing MariaLetta.
Similar Projects
docconv
Roadmap
The roadmap for this project is available here. It's meant more as a guide than a strict plan because I only work on this project in my free time.