url2epub

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2022 License: BSD-3-Clause Imports: 24 Imported by: 3

README

Go Reference Go Report Card Gitter

url2epub

Create ePub files from URLs

Overview

The root directory provides a Go library that creates ePub files out of URLs, with limitations.

rmapi/ directory provides a Go library that implements reMarkable API, so that the ePub files generated can be sent to reMarkable paper tablet directly.

tgbot/ directory provides a Go library that implements partial Telegram bot API, so all this can be done in a Telegram message.

cloudrun/ directory provides the Google Cloud Run implementation of the Telegram Bot that does all this, and also serving REST APIs.

License

BSD 3-Clause.

Documentation

Overview

Package url2epub fetches http(s) URL and extracts ePub files from them.

Index

Constants

View Source
const EpubMimeType = `application/epub+zip`

EpubMimeType is the mime type for epub.

Variables

This section is empty.

Functions

func DrainAndClose

func DrainAndClose(r io.ReadCloser) error

DrainAndClose drains and closes r.

func Epub

func Epub(args EpubArgs) (id string, err error)

Epub creates an Epub 3.0 file from given content.

Types

type EpubArgs

type EpubArgs struct {
	// The destination to write the epub content to.
	Dest io.Writer

	// The title of the epub.
	Title string

	// The node pointing to the html tag.
	Node *html.Node

	// Images map:
	// key: image local filename
	// value: image content
	Images map[string]io.Reader
}

EpubArgs defines the args used by Epub function.

type GetHTMLArgs

type GetHTMLArgs struct {
	// The HTTP GET URL, required.
	URL string

	// The User-Agent header to use, optional.
	UserAgent string

	// The bearer token for the twitter client.
	// If non-empty and the URL is a twitter URL,
	// it uses Twitter API to get the thread instead of the raw HTML.
	TwitterBearer string
}

GetHTMLArgs define the arguments used by GetHTML function.

type Node

type Node html.Node

Node is typedef'd *html.Node with helper functions attached.

func FromNode

func FromNode(n *html.Node) *Node

FromNode casts *html.Node into *Node.

func GetHTML

func GetHTML(ctx context.Context, args GetHTMLArgs) (*Node, *url.URL, error)

GetHTML does HTTP get requests on HTML content.

It's different from standard http.Get in the following ways:

- If there are redirects happening during the request, returned URL will be the URL of the last (final) request.

- Instead of returning *http.Response, it returns parsed *html.Node, with Type being ElementNode and DataAtom being Html (instead of root node, which is usually DoctypeNode).

- The client used by Get does not have timeout set. It's expected that a deadline is set in the ctx passed in.

func (Node) AsNode

func (n Node) AsNode() html.Node

AsNode casts n back to *html.Node

func (*Node) FindFirstAtomNode

func (n *Node) FindFirstAtomNode(a atom.Atom) *Node

FindFirstAtomNode returns n itself or the first node in its descendants, with Type == html.ElementNode and DataAtom == a, using depth first search.

If none of n's descendants matches, nil will be returned.

func (Node) ForEachChild

func (n Node) ForEachChild(f func(child *Node) bool)

ForEachChild calls f on each of n's children.

If f returns false, ForEachChild stops the iteration.

func (*Node) GetAMPurl

func (n *Node) GetAMPurl() string

GetAMPurl returns the amp URL of the document, if any.

func (*Node) GetLang

func (n *Node) GetLang() string

GetLang returns the lang attribute of html node, if any.

func (*Node) GetTitle

func (n *Node) GetTitle() (title string)

GetTitle returns the title of the document, if any.

Note that if og:title exists in the meta header, it's preferred over title.

func (*Node) IsAMP

func (n *Node) IsAMP() bool

IsAMP returns true if root is an AMP html document.

func (*Node) Readable

func (n *Node) Readable(ctx context.Context, args ReadableArgs) (*html.Node, map[string]io.Reader, error)

Readable strips node n into a readable one, with all images downloaded and replaced.

type ReadableArgs

type ReadableArgs struct {
	// Base URL of the document, used in case the image URLs are relative.
	BaseURL *url.URL

	// User-Agent to be used to download images.
	UserAgent string

	// Directory prefix for downloaded images.
	ImagesDir string

	// If Grayscale is set to true,
	// all images will be grayscaled and encoded as jpegs.
	//
	// If any error happened while trying to grayscale the image,
	// it will be logged via Logger.
	Grayscale bool
	Logger    logger.Logger
}

ReadableArgs defines the args used by Readable function.

Directories

Path Synopsis
appengine module
Package birds generates HTML out of twitter threads.
Package birds generates HTML out of twitter threads.
cloudrun module
cmd
epubwriter Module
debug module
Package grayscale provides function to grayscale an image.
Package grayscale provides function to grayscale an image.
Package logger provides a simple log interface that you can wrap whatever logging library you use into.
Package logger provides a simple log interface that you can wrap whatever logging library you use into.
Package rmapi implements reMarkable api, as described in https://github.com/splitbrain/ReMarkableAPI/wiki.
Package rmapi implements reMarkable api, as described in https://github.com/splitbrain/ReMarkableAPI/wiki.
debug Module
Package tgbot provides some simple wrapping around telegram bot api.
Package tgbot provides some simple wrapping around telegram bot api.
Package ziputil provides some utility functions for zip archive handling.
Package ziputil provides some utility functions for zip archive handling.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL